Welcome to the December 2019 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.
The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this report for December, we cover:
- Media coverage — A Google whitepaper, The Update Framework graduates within the Cloud Native Computing Foundation, etc.
- Reproducible Builds Summit 2019 — What happened at our recent meetup?
- Distribution work — The latest reports from Arch, Debian and openSUSE, etc.
- Software development — Patches, patches, patches…
- Mailing list summary
- Contact — How to contribute, etc.
If you are interested in contributing to our project, please visit the Contribute page on our website.
Google published Binary Authorization for Borg, a whitepaper on how they reduce exposure of user data to unauthorised code as well as methods for verifying code provenance using their Borg cluster manager. In particular, the paper notes how they attempt to limit their “insider risk”, ie. the potential for internal personnel to use organisational credentials or knowledge to perform malicious activities.
The Linux Foundation announced that The Update Framework (TUF) has graduated within the Cloud Native Computing Foundation (CNCF) and thus becomes the first specification and first security-focused project to reach the highest maturity level in that group. TUF is a technology that secures software update systems initially developed by Justin Cappos at the NYU Tandon School of Engineering.
Andrew “bunnie” Huang published a blog post asking Can We Build Trustable Hardware? Whilst it concludes pessimistically that “open hardware is precisely as trustworthy as closed hardware” it does mention that reproducible builds can:
Enable any third-party auditor to download, build, and confirm that the program a user is downloading matches the intent of the developers.
At the 36th Chaos Communication Congress (36C3) in Leipzig, Hannes Mehnert from the MirageOS project gave a presentation called Leaving legacy behind which talks generally about Mirage system offering a potential alternative and minimalist approach to security but has a section on reproducible builds (at 38m41s).
Reproducible Builds Summit 2019
The aim of the meeting was to spend time discussing and working on Reproducible Builds with a widely diverse agenda and the event was a huge success.
During our time together, we updated and exchanged the status of reproducible builds in our respective projects, improved collaboration between and within these efforts, expanded the scope and reach of reproducible builds to yet more interested parties, established and continued strategic long-term thinking in a way not typically possible via remote channels, and brainstormed designs for tools to enable end-users to get the most benefit from reproducible builds.
Outside of these achievements in the hacking sessions kpcyrd made a breakthrough in Alpine Linux by producing the first reproducible package — specifically,
py3-uritemplate — in this operating system. After this, progress was accelerated and by the denouement of our meeting the reproducibility status in Alpine reached 94%. In addition, Jelle van der Waa, Mattia Rizzolo and Paul Spooren discussed and implemented substantial changes to the database that underpins the testing framework that powers tests.reproducible-builds.org in order to abstract the schema in a distribution agnostic way, for example to allow submitting the results of attempts to verify officially distributed Arch Linux packages.
Lastly, Jan Nieuwenhuizen, David Terry and Vagrant Cascadian used three entirely-separate distributions (GNU Guix, NixOS and Debian) to produce a bit-for-bit identical GNU Mes binary despite using three different major versions of GCC and other toolchain components to build an initial binary, which was then used to build a final, bit-for-bit identical, binary of Mes.
The event was held at Priscilla, Queen of the Medina in Marrakesh, a location sui generis that stands for gender equality, female empowerment and the engagement of vulnerable communities locally through cultural activism. The event was open to anybody interested in working on Reproducible Builds issues, with or without prior experience.
A number of reports and blog posts have already been written, including for:
Within Debian, Chris Lamb categorised a large number of packages and issues in the Reproducible Builds
notes.git repository, including identifying and creating
pip installreproducible, avoid trouble with Zip order & mtime)
rpmlint-mini(sort Python compile file list)
rubygem-ronn(Ruby date, submitted upstream with updated patch)
readdir; already upstream)
Bernhard also filed bugs against:
libmicro(Link-Time Optimisation causing unreproducible object files; fix by Martin Pluskal)
python-swifter(report failure to build on single-core CPUs)
tesseract-ocr(report variations via their build machine’s CPU)
The Yocto Project announced that it is running continuous tests on the reproducibility of its output which can observed through the
oe-selftest runs on their build server. This was previously limited to just the mini images but this has now been extended to the larger graphical images. The test framework is available for end users to use against their own builds. Of particular interest is the production of binary identical results — despite arbitrary build paths — to allow more efficient builds through reuse of previously built objects, a topic covered in more-depth in a recent LWN article.
In Arch Linux, the database structure on tests.reproducible-builds.org was changed and the testing jobs updated to match and work has been started on a verification test job which rebuilds the officially released packages and verifies if they are reproducible or not. In the “hacking” time after our recent summit, several key packages were made reproducible, raising the amount of reproducible packages by approximately 1.5%. For example
libxslt was patched with the patch originating from Debian and openSUSE.
diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, diffoscope version
134 was uploaded to Debian unstable by Chris Lamb. He also made the following changes to diffoscope itself, including:
- Always pass a filename with a
zipnoteotherwise it will return with an UNIX exit code of 9 and we fallback to displaying a binary difference for the entire file. […]
- Include the libarchive file listing for ISO images to ensure that timestamps – and not just dates – are visible in any difference. (#81)
- Ensure that our autopkgtests are run with our
pyproject.tomlpresent for the correct black source code formatter settings. (#945993)
- Rename the
text_option_with_stdout[…] and tidy some unnecessary boolean logic in the ISO9660 tests […].
In addition, Eli Schwartz fixed an error in the handling of the progress bar […] and Vagrant Cascadian added external tool reference for the zstd compression format for GNU Guix […] as well as updated the version to 133 […] and 134 […] in that distribution.
Project website & documentation
There was more work performed on our website this month, including:
Bernhard M. Wiedemann:
Jelle van der Waa:
In addition, Paul Spooren added a new page overviewing our Continuous Tests overview […], Hervé Boutemy made a number of improvements to our Java and JVM documentation expanding and clarifying various definitions as well as adding external links […][…][…][…] and Mariana Moreira added a
.jekyll-cache entry to the
.gitignore file […].
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Bernhard M. Wiedemann:
charybdis(shell date & time)
.vofiles vary from build order)
TIMESTAMPvariable instead of build date)
- OpenStack Python packages (don’t package a
perl(fix documentation-related build failure in 2020)
php7-pear(sort a PHP-based
pmix(date, time, host & user)
pw3270(make date &
python-autobahn(report stuck tests on single CPU machine)
ripgrep(report variations from CPU)
rubygem-ronn(updated date patch)
vpp(shell date, regression fix)
- Multiple patches to the grass Geographic Information System]: […][…][…]
Jelle van der Waa:
- #857454 re-opened against qtltools.
- #946315 filed against infernal (forwarded upstream).
- #946330 filed against usb-modeswitch-data (applied upstream).
- #946331 filed against gtk-doc (forwarded upstream).
- #946332 filed against nftables.
- #946333 filed against node-chart.js (forwarded upstream).
- #946335 filed against parsinsert.
- #947608 filed against markdown.
- #947708 filed against libtext-markdown-perl.
- Permit other distributions to use our web-based package scheduling script. […]
- Reformat our power-cycling script using Black and use the Python
- Introduce a
dsourcesdatabase view to simplify some queries […] and add a
build_typefield to support both “doublerebuilds” and verification rebuilds […].
- Move (almost) all the timestamps in the database schema from raw strings to “real” timestamp data types. […]
- Only block bots on jenkins.debian.net and tests.reproducible-builds.org, not any other sites. […]
kpcyrd (for Alpine Linux):
- Patch/install the
abuildutility to one that is reproducible. […][…][…][…]
- Bump the number of build workers and collect garbage more frequently. […][…][…][…]
- Classify and display build results consistently. […][…][…]
- Ensure that tmux and ripgrep is installed. […][…]
- Support building packages in the future. […][…][…]
- Patch/install the
Mailing list summary
There was considerable activity on our mailing list this month. Firstly, Bernhard M. Wiedemann posted a thread asking What is the goal of reproducible builds? in order to encourage refinements, extra questions and other contributions to what an end-user experience of reproducible builds should or even could look like.
Eli Schwartz then resurrected a previous thread titled Progress in rpm and openSUSE in 2019 to clarify some points around Arch Linux and Python package installation. Hans-Christoph Steiner followed-up to a separate thread originally started by Hervé Boutemy announcing the status of
.buildinfo file support in the Java ecosystem, and Paul Spooren then informed the list that Google Summer of Code is now looking for projects for the latest cohort.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Arnout Engelen, Bernhard M. Wiedemann, Chris Lamb, Hervé Boutemy, Holger Levsen, Jelle van der Waa, Lukas Puehringer and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.