Welcome to the September 2019 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. As a quick refresher of what our project is about, whilst anyone can inspect the source code of free software for malicious changes, most software is distributed to end users or servers as precompiled binaries. The motivation behind the reproducible builds effort is to ensure zero changes have been introduced during these compilation processes. This is achieved by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In September’s report, we cover:
- Media coverage & events — more presentations, preventing Stuxnet, etc.
- Upstream news — kernel reproducibility, grafana, systemd, etc.
- Distribution work — reproducible images in Arch Linux, policy changes in Debian, etc.
- Software development — yet more work on diffoscope, upstream patches, etc.
- Misc news & getting in touch — from our mailing list how to contribute, etc
If you are interested in contributing to our project, please visit our Contribute page on our website.
Media coverage & events
In addition, our project was highlighted as part of a presentation by Andrew Martin at the All Systems Go conference in Berlin titled Rootless, Reproducible & Hermetic: Secure Container Build Showdown, and Björn Michaelsen from the Document Foundation presented at the 2019 LibreOffice Conference in Almería in Spain on the status of reproducible builds in the LibreOffice office suite.
In academia, Anastasis Keliris and Michail Maniatakos from the New York University Tandon School of Engineering published a paper titled ICSREF: A Framework for Automated Reverse Engineering of Industrial Control Systems Binaries (PDF) that speaks to concerns regarding the security of Industrial Control Systems (ICS) such as those attacked via Stuxnet. The paper outlines their ICSREF tool for reverse-engineering binaries from such systems and furthermore demonstrates a scenario whereby a commercial smartphone equipped with ICSREF could be easily used to compromise such infrastructure.
Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. If you would like to learn more about the event and how to register, please visit our our dedicated event page.
Ben Hutchings added documentation to the Linux kernel regarding how to make the build reproducible. As he mentioned in the commit message, the kernel is “actually” reproducible but the end-to-end process was not previously documented in one place and thus Ben describes the workflow and environment needed to ensure a reproducible build.
Daniel Edgecumbe submitted a pull request which was subsequently merged to the logging/journaling component of systemd in order that the output of e.g.
journalctl --update-catalog does not differ between subsequent runs despite there being no changes in the input files.
Jelle van der Waa noticed that if the grafana monitoring tool was built within a source tree devoid of Git metadata then the current timestamp was used instead, leading to an unreproducible build. To avoid this, Jelle submitted a pull request in order that it use
SOURCE_DATE_EPOCH if available.
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. Thunderbird and
kernel-vanilla packages will be among the larger ones to become reproducible soon and there were additional Python patches to help reproducibility issues of modules written in this language that have C bindings.
OpenWrt is a Linux-based operating system targeting embedded devices such as wireless network routers. This month, Paul Spooren (aparcar) switched the toolchain the use the GCC version 8 by default in order to support the
-ffile-prefix-map= which permits a varying build path without affecting the binary result of the build […]. In addition, Paul updated the
kernel-defaults package to ensure that the
SOURCE_DATE_EPOCH environment variable is considered when creating the the
Lukas Pühringer prepared an upload which was sponsored by Holger Levsen of
python-securesystemslib version 0.11.3-1 to Debian unstable.
python-securesystemslib is a dependency of in-toto, a framework to protect the integrity of software supply chains.
mkinitcpio component of Arch Linux was updated by Daniel Edgecumbe in order that it generates reproducible initramfs images by default, meaning that two subsequent runs of
mkinitcpio produces two files that are identical at the binary level. The commit message elaborates on its methodology:
Timestamps within the initramfs are set to the Unix epoch of 1970-01-01. Note that in order for the build to be fully reproducible, the compressor specified (e.g. gzip, xz) must also produce reproducible archives. At the time of writing, as an inexhaustive example, the lzop compressor is incapable of producing reproducible archives due to the insertion of a runtime timestamp.
In addition, a bug was created to track progress on making the Arch Linux ISO images reproducible.
In July, Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that
.buildinfo metadata files were not being automatically propagated in the case that packages had to be manually approved in “
NEW queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a patch for the issue that was merged and deployed this month.
Aurélien Jarno filed a bug against the Debian Policy (#940234) to request a section be added regarding the reproducibility of source packages. Whilst there is already a section about reproducibility in the Policy, it only mentions binary packages. Aurélien suggest that it:
… might be a good idea to add a new requirement that repeatedly building the source package in the same environment produces identical
In addition, 51 reviews of Debian packages were added, 22 were updated and 47 were removed this month adding to our knowledge about identified issues. Many issue types were added by Chris Lamb including
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
colobot-data(sort a Python
readdir, forwarded upstream)
griefly(sort a Python
readdir, forwarded upstream)
latex2html(drop LaTeX log file with date)
MozillaFirefox(make Profile-Guided Optimisation optional)
ninja(build failure when build without parallelism)
python-futures(fix build failure)
python-holidays(fix build failure in 2020)
python-iminuit(sort a Python glob)
python-ioflo(fix build failure via security certificate renewal)
python-keystoneauth1(fix build failure in 2020)
volk(report compile-time CPU-detection)
wget2(drop a build date)
- Chris Lamb (“lamby”):
- #939546 filed against
- #939547 filed against
- #939548 filed against
- #939549 filed against
- #939650 filed against
- #940013 filed against
- #940156 filed against
- #940639 filed against
- #941072 filed against
- #941116 filed against
libguestfscomponents have received a patch to support
- #939546 filed against
- Rebecca N. Palmer:
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb uploaded versions
125 and made the following changes:
/srv/diffoscope/binto the Docker image path. (#70)
- When skipping tests due to the lack of installed tool, print the package that might provide it. […]
- Update the “no progressbar” logging message to match the parallel
missing tlsh modulewarnings. […]
- Update “requires foo” messages to clarify that they are referring to Python modules. […]
test_libmix_differencesELF binary test requires the
- Build the OCaml test input files on-demand rather than shipping them with the package in order to prevent test failures with OCaml 4.08. (#67)
- Also conditionally skip the identification and “no differences” tests as we require the OCaml compiler to be present when building the test files themselves. (#940471)
- Rebuild our test squashfs images to exclude the character device as they requires root or fakeroot to extract. (#65)
Many code cleanups, including dropping some unnecessary control flow […], dropping unnecessary
passstatements […] and dropping explicitly inheriting from
objectclass as it unnecessary in Python 3 […].
In addition, Marc Herbert completely overhauled the handling of ELF binaries particularly around many assumptions that were previously being made via file extensions, etc. […][…][…] and updated the testsuite to support a newer version of the coreboot utilities. […]. Mattia Rizzolo then ensured that diffoscope does not crash when the progress bar module is missing but the functionality was requested […] and made our version checking code more lenient […]. Lastly, Vagrant Cascadian not only updated diffoscope to versions 123 and 125, he enabled a more complete test suite in the GNU Guix distribution. […][…][…][…][…][…]
There was yet more effort put into our our website this month, including:
- Chris Lamb:
- Holger Levsen:
- Add a link to our style guide on our “tools” page. […]
- Rework the handling of news/events, including adding a news archive page […] and differentiating between news and reports on the homepage […].
- Large number of changes to the “Who is Involved?” page, including adding a link to F-Droid’s verification server […] and their verification tool for end-users […] as well as adding the Civil Infrastructure Project (CIP) as a sponsor […]
- Include a link to our testing framework in all navigation elements. […]
- Add/improve a number of presentation entries on our Talks & Resources page. […][…][…][…][…]
In addition, Cindy Kim added in-toto to our “Who is Involved?” page, James Fenn updated our homepage to fix a number of spelling and grammar issues […] and Peter Conrad added BitShares to our list of projects interested in Reproducible Builds […].
strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Marc Herbert made a huge number of changes including:
- GNU ar handler:
- Don’t corrupt the pseudo file mode of the symbols table.
- Add test files for “symtab” (
/) and long names (
- Don’t corrupt the SystemV/GNU table of long filenames.
- Add a new
$File::StripNondeterminism::verboseglobal and, if enabled, tell the user that
ar(1)could not set the symbol table’s mtime.
In addition, Chris Lamb performed some issue investigation with the Debian Perl Team regarding issues in the
Archive::Zip module including a problem with corruption of members that use
bzip compression as well as a regression whereby various metadata fields were not being updated that was reported in/around Debian bug #940973.
- Alexander “lynxis” Couzens:
- Holger Levsen:
- Correctly handle the
$DEBUGvariable in OpenWrt builds. […]
- Fefactor and notify the
#archlinux-reproducibleIRC channel for problems in this distribution. […]
- Ensure that only one mail is sent when rebooting nodes. […]
- Unclutter the output of a Debian maintenance job. […]
- Drop a “todo” entry as we vary on a merged
/usrfor some time now. […]
- Correctly handle the
In addition, Paul Spooren added an OpenWrt snapshot build script which downloads
.buildinfo and related checksums from the relevant download server and attempts to rebuild and then validate them for reproducibility. […]
reprotest is our end-user tool to build same source code twice in different environments and then check the binaries produced by each build for differences. This month, a change by Dmitry Shachnev was merged to not use the
faketime wrapper at all when asked to not vary time […] and Holger Levsen subsequently released this as version
0.7.9 as dramatically overhauling the packaging […][…].
Misc news & getting in touch
On our mailing list Rebecca N. Palmer started a thread titled Addresses in IPython output which points out and attempts to find a solution to a problem with Python packages, whereby objects that don’t have an explicit string representation have a default one that includes their memory address. This causes problems with reproducible builds if/when such output appears in generated documentation.
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.