Reproducible Builds in December 2019

View all our monthly reports


Welcome to the December 2019 report from the Reproducible Builds project!

In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

In this report for December, we cover:

  • Media coverageA Google whitepaper, The Update Framework graduates within the Cloud Native Computing Foundation, etc.
  • Reproducible Builds Summit 2019What happened at our recent meetup?
  • Distribution workThe latest reports from Arch, Debian and openSUSE, etc.
  • Software developmentPatches, patches, patches…
  • Mailing list summary
  • ContactHow to contribute, etc.

If you are interested in contributing to our project, please visit the Contribute page on our website.


Media coverage

Google published Binary Authorization for Borg, a whitepaper on how they reduce exposure of user data to unauthorised code as well as methods for verifying code provenance using their Borg cluster manager. In particular, the paper notes how they attempt to limit their “insider risk”, ie. the potential for internal personnel to use organisational credentials or knowledge to perform malicious activities.

The Linux Foundation announced that The Update Framework (TUF) has graduated within the Cloud Native Computing Foundation (CNCF) and thus becomes the first specification and first security-focused project to reach the highest maturity level in that group. TUF is a technology that secures software update systems initially developed by Justin Cappos at the NYU Tandon School of Engineering.

Andrew “bunnie” Huang published a blog post asking Can We Build Trustable Hardware? Whilst it concludes pessimistically that “open hardware is precisely as trustworthy as closed hardware” it does mention that reproducible builds can:

Enable any third-party auditor to download, build, and confirm that the program a user is downloading matches the intent of the developers.

At the 36th Chaos Communication Congress (36C3) in Leipzig, Hannes Mehnert from the MirageOS project gave a presentation called Leaving legacy behind which talks generally about Mirage system offering a potential alternative and minimalist approach to security but has a section on reproducible builds (at 38m41s).


Reproducible Builds Summit 2019

We held our fifth annual Reproducible Builds summit between the 1st and 8th December at Priscilla, Queen of the Medina in Marrakesh, Morocco.

The aim of the meeting was to spend time discussing and working on Reproducible Builds with a widely diverse agenda and the event was a huge success.

During our time together, we updated and exchanged the status of reproducible builds in our respective projects, improved collaboration between and within these efforts, expanded the scope and reach of reproducible builds to yet more interested parties, established and continued strategic long-term thinking in a way not typically possible via remote channels, and brainstormed designs for tools to enable end-users to get the most benefit from reproducible builds.

Outside of these achievements in the hacking sessions kpcyrd made a breakthrough in Alpine Linux by producing the first reproducible package — specifically, py3-uritemplate — in this operating system. After this, progress was accelerated and by the denouement of our meeting the reproducibility status in Alpine reached 94%. In addition, Jelle van der Waa, Mattia Rizzolo and Paul Spooren discussed and implemented substantial changes to the database that underpins the testing framework that powers tests.reproducible-builds.org in order to abstract the schema in a distribution agnostic way, for example to allow submitting the results of attempts to verify officially distributed Arch Linux packages.

Lastly, Jan Nieuwenhuizen, David Terry and Vagrant Cascadian used three entirely-separate distributions (GNU Guix, NixOS and Debian) to produce a bit-for-bit identical GNU Mes binary despite using three different major versions of GCC and other toolchain components to build an initial binary, which was then used to build a final, bit-for-bit identical, binary of Mes.

The event was held at Priscilla, Queen of the Medina in Marrakesh, a location sui generis that stands for gender equality, female empowerment and the engagement of vulnerable communities locally through cultural activism. The event was open to anybody interested in working on Reproducible Builds issues, with or without prior experience.

A number of reports and blog posts have already been written, including for:

… as well as a number of tweets including ones from Jan Nieuwenhuizen celebrating progress in GNU Guix [] and Hannes [].


Distribution work

Within Debian, Chris Lamb categorised a large number of packages and issues in the Reproducible Builds notes.git repository, including identifying and creating markdown_random_email_address_html_entities and nondeterministic_devhelp_documentation_generated_by_gtk_doc.

In openSUSE, Bernhard published his monthly Reproducible Builds status update and filed the following patches:

Bernhard also filed bugs against:

The Yocto Project announced that it is running continuous tests on the reproducibility of its output which can observed through the oe-selftest runs on their build server. This was previously limited to just the mini images but this has now been extended to the larger graphical images. The test framework is available for end users to use against their own builds. Of particular interest is the production of binary identical results — despite arbitrary build paths — to allow more efficient builds through reuse of previously built objects, a topic covered in more-depth in a recent LWN article.

In Arch Linux, the database structure on tests.reproducible-builds.org was changed and the testing jobs updated to match and work has been started on a verification test job which rebuilds the officially released packages and verifies if they are reproducible or not. In the “hacking” time after our recent summit, several key packages were made reproducible, raising the amount of reproducible packages by approximately 1.5%. For example libxslt was patched with the patch originating from Debian and openSUSE.


Software development

diffoscope

diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

This month, diffoscope version 134 was uploaded to Debian unstable by Chris Lamb. He also made the following changes to diffoscope itself, including:

  • Always pass a filename with a .zip extension to zipnote otherwise it will return with an UNIX exit code of 9 and we fallback to displaying a binary difference for the entire file. []
  • Include the libarchive file listing for ISO images to ensure that timestamps – and not just dates – are visible in any difference. (#81)
  • Ensure that our autopkgtests are run with our pyproject.toml present for the correct black source code formatter settings. (#945993)
  • Rename the text_option_with_stdiout test to text_option_with_stdout [] and tidy some unnecessary boolean logic in the ISO9660 tests [].

In addition, Eli Schwartz fixed an error in the handling of the progress bar [] and Vagrant Cascadian added external tool reference for the zstd compression format for GNU Guix [] as well as updated the version to 133 [] and 134 [] in that distribution.

Project website & documentation

There was more work performed on our website this month, including:

In addition, Paul Spooren added a new page overviewing our Continuous Tests overview [], Hervé Boutemy made a number of improvements to our Java and JVM documentation expanding and clarifying various definitions as well as adding external links [][][][] and Mariana Moreira added a .jekyll-cache entry to the .gitignore file [].

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Holger Levsen:

    • Alpine:

      • Indicate where Alpine is being built on the node overview page. []
      • Turn off debugging output. []
      • Sleep longer if no packages are to be built. []
    • Misc:

      • Add some help text to our script to powercycle IONOS (neé Profitbricks) nodes. []
      • Install mosh everywhere. []
      • Only install ripgrep on Debian nodes. []
  • Mattia Rizzolo:

    • Arch Linux:

      • Normalise the suite names in the database. [][][][][]
      • Drop an unneeded line in the scheduler. []
    • Debian:

      • Fix a number of SQL errors. [][][][]
      • Use the debian.debian_support Python library over apt_pkg to perform version comparisons. []
    • Misc:

      • Permit other distributions to use our web-based package scheduling script. []
      • Reformat our power-cycling script using Black and use the Python logging module. []
      • Introduce a dsources database view to simplify some queries [] and add a build_type field to support both “doublerebuilds” and verification rebuilds [].
      • Move (almost) all the timestamps in the database schema from raw strings to “real” timestamp data types. []
      • Only block bots on jenkins.debian.net and tests.reproducible-builds.org, not any other sites. []

  • kpcyrd (for Alpine Linux):

    • Patch/install the abuild utility to one that is reproducible. [][][][]
    • Bump the number of build workers and collect garbage more frequently. [][][][]
    • Classify and display build results consistently. [][][]
    • Ensure that tmux and ripgrep is installed. [][]
    • Support building packages in the future. [][][]

Lastly, Paul Spooren removed the project overview from the bottom-left of the generated pages [] and the usual node maintenance was performed by Holger Levsen [] and Mattia Rizzolo [][].


Mailing list summary

There was considerable activity on our mailing list this month. Firstly, Bernhard M. Wiedemann posted a thread asking What is the goal of reproducible builds? in order to encourage refinements, extra questions and other contributions to what an end-user experience of reproducible builds should or even could look like.

Eli Schwartz then resurrected a previous thread titled Progress in rpm and openSUSE in 2019 to clarify some points around Arch Linux and Python package installation. Hans-Christoph Steiner followed-up to a separate thread originally started by Hervé Boutemy announcing the status of .buildinfo file support in the Java ecosystem, and Paul Spooren then informed the list that Google Summer of Code is now looking for projects for the latest cohort.

Lastly, Lars Wirzenius enquired about the status of Reproducible system images which resulted in a large number of responses.


Contact

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:



This month’s report was written by Arnout Engelen, Bernhard M. Wiedemann, Chris Lamb, Hervé Boutemy, Holger Levsen, Jelle van der Waa, Lukas Puehringer and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports