Reproducible Builds in September 2019

View all our monthly reports


Welcome to the September 2019 report from the Reproducible Builds project!

In these reports we outline the most important things that we have been up to over the past month. As a quick refresher of what our project is about, whilst anyone can inspect the source code of free software for malicious changes, most software is distributed to end users or servers as precompiled binaries. The motivation behind the reproducible builds effort is to ensure zero changes have been introduced during these compilation processes. This is achieved by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

In September’s report, we cover:

  • Media coverage & eventsmore presentations, preventing Stuxnet, etc.
  • Upstream newskernel reproducibility, grafana, systemd, etc.
  • Distribution workreproducible images in Arch Linux, policy changes in Debian, etc.
  • Software developmentyet more work on diffoscope, upstream patches, etc.
  • Misc news & getting in touchfrom our mailing list how to contribute, etc

If you are interested in contributing to our project, please visit our Contribute page on our website.


Media coverage & events

This month Vagrant Cascadian attended the 2019 GNU Tools Cauldron in Montréal, Canada and gave a presentation entitled Reproducible Toolchains for the Win (video).

In addition, our project was highlighted as part of a presentation by Andrew Martin at the All Systems Go conference in Berlin titled Rootless, Reproducible & Hermetic: Secure Container Build Showdown, and Björn Michaelsen from the Document Foundation presented at the 2019 LibreOffice Conference in Almería in Spain on the status of reproducible builds in the LibreOffice office suite.

In academia, Anastasis Keliris and Michail Maniatakos from the New York University Tandon School of Engineering published a paper titled ICSREF: A Framework for Automated Reverse Engineering of Industrial Control Systems Binaries (PDF) that speaks to concerns regarding the security of Industrial Control Systems (ICS) such as those attacked via Stuxnet. The paper outlines their ICSREF tool for reverse-engineering binaries from such systems and furthermore demonstrates a scenario whereby a commercial smartphone equipped with ICSREF could be easily used to compromise such infrastructure.

Lastly, It was announced that Vagrant Cascadian will present a talk at SeaGL in Seattle, Washington during November titled There and Back Again, Reproducibly.


2019 Summit

Registration for our fifth annual Reproducible Builds summit that will take place between 1st → 8th December in Marrakesh, Morocco has opened and personal invitations have been sent out.

Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. If you would like to learn more about the event and how to register, please visit our our dedicated event page.


Upstream news

Ben Hutchings added documentation to the Linux kernel regarding how to make the build reproducible. As he mentioned in the commit message, the kernel is “actually” reproducible but the end-to-end process was not previously documented in one place and thus Ben describes the workflow and environment needed to ensure a reproducible build.

Daniel Edgecumbe submitted a pull request which was subsequently merged to the logging/journaling component of systemd in order that the output of e.g. journalctl --update-catalog does not differ between subsequent runs despite there being no changes in the input files.

Jelle van der Waa noticed that if the grafana monitoring tool was built within a source tree devoid of Git metadata then the current timestamp was used instead, leading to an unreproducible build. To avoid this, Jelle submitted a pull request in order that it use SOURCE_DATE_EPOCH if available.

Mes (a Scheme-based compiler for our “sister” bootstrappable builds effort) announced their 0.20 release.


Distribution work

Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. Thunderbird and kernel-vanilla packages will be among the larger ones to become reproducible soon and there were additional Python patches to help reproducibility issues of modules written in this language that have C bindings.

OpenWrt is a Linux-based operating system targeting embedded devices such as wireless network routers. This month, Paul Spooren (aparcar) switched the toolchain the use the GCC version 8 by default in order to support the -ffile-prefix-map= which permits a varying build path without affecting the binary result of the build []. In addition, Paul updated the kernel-defaults package to ensure that the SOURCE_DATE_EPOCH environment variable is considered when creating the the /init directory.

Alexander “lynxis” Couzens began working on a set of build scripts for creating firmware and operating system artifacts in the coreboot distribution.

Lukas Pühringer prepared an upload which was sponsored by Holger Levsen of python-securesystemslib version 0.11.3-1 to Debian unstable. python-securesystemslib is a dependency of in-toto, a framework to protect the integrity of software supply chains.

Arch Linux

The mkinitcpio component of Arch Linux was updated by Daniel Edgecumbe in order that it generates reproducible initramfs images by default, meaning that two subsequent runs of mkinitcpio produces two files that are identical at the binary level. The commit message elaborates on its methodology:

Timestamps within the initramfs are set to the Unix epoch of 1970-01-01. Note that in order for the build to be fully reproducible, the compressor specified (e.g. gzip, xz) must also produce reproducible archives. At the time of writing, as an inexhaustive example, the lzop compressor is incapable of producing reproducible archives due to the insertion of a runtime timestamp.

In addition, a bug was created to track progress on making the Arch Linux ISO images reproducible.

Debian

In July, Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that .buildinfo metadata files were not being automatically propagated in the case that packages had to be manually approved in “NEW queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a patch for the issue that was merged and deployed this month.

Aurélien Jarno filed a bug against the Debian Policy (#940234) to request a section be added regarding the reproducibility of source packages. Whilst there is already a section about reproducibility in the Policy, it only mentions binary packages. Aurélien suggest that it:

… might be a good idea to add a new requirement that repeatedly building the source package in the same environment produces identical .dsc files.

In addition, 51 reviews of Debian packages were added, 22 were updated and 47 were removed this month adding to our knowledge about identified issues. Many issue types were added by Chris Lamb including buildpath_in_code_generated_by_bison, buildpath_in_postgres_opcodes and ghc_captures_build_path_via_tempdir.

Software development

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

This month, Chris Lamb uploaded versions 123, 124 and 125 and made the following changes:

  • New features:

    • Add /srv/diffoscope/bin to the Docker image path. (#70)
    • When skipping tests due to the lack of installed tool, print the package that might provide it. []
    • Update the “no progressbar” logging message to match the parallel missing tlsh module warnings. []
    • Update “requires foo” messages to clarify that they are referring to Python modules. []
  • Testsuite updates:

    • The test_libmix_differences ELF binary test requires the xxd tool. (#940645)
    • Build the OCaml test input files on-demand rather than shipping them with the package in order to prevent test failures with OCaml 4.08. (#67)
    • Also conditionally skip the identification and “no differences” tests as we require the OCaml compiler to be present when building the test files themselves. (#940471)
    • Rebuild our test squashfs images to exclude the character device as they requires root or fakeroot to extract. (#65)
  • Many code cleanups, including dropping some unnecessary control flow [], dropping unnecessary pass statements [] and dropping explicitly inheriting from object class as it unnecessary in Python 3 [].

In addition, Marc Herbert completely overhauled the handling of ELF binaries particularly around many assumptions that were previously being made via file extensions, etc. [][][] and updated the testsuite to support a newer version of the coreboot utilities. []. Mattia Rizzolo then ensured that diffoscope does not crash when the progress bar module is missing but the functionality was requested [] and made our version checking code more lenient []. Lastly, Vagrant Cascadian not only updated diffoscope to versions 123 and 125, he enabled a more complete test suite in the GNU Guix distribution. [][][][][][]

Project website

There was yet more effort put into our our website this month, including:

In addition, Cindy Kim added in-toto to our “Who is Involved?” page, James Fenn updated our homepage to fix a number of spelling and grammar issues [] and Peter Conrad added BitShares to our list of projects interested in Reproducible Builds [].

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Marc Herbert made a huge number of changes including:

  • GNU ar handler:
    • Don’t corrupt the pseudo file mode of the symbols table.
    • Add test files for “symtab” (/) and long names (//).
    • Don’t corrupt the SystemV/GNU table of long filenames.
  • Add a new $File::StripNondeterminism::verbose global and, if enabled, tell the user that ar(1) could not set the symbol table’s mtime.

In addition, Chris Lamb performed some issue investigation with the Debian Perl Team regarding issues in the Archive::Zip module including a problem with corruption of members that use bzip compression as well as a regression whereby various metadata fields were not being updated that was reported in/around Debian bug #940973.

Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org.

  • Alexander “lynxis” Couzens:
    • Fix missing .xcompile in the build system. []
    • Install the GNAT Ada compiler on all builders. []
    • Don’t install the iasl ACPI power management compiler/decompiler. []
  • Holger Levsen:
    • Correctly handle the $DEBUG variable in OpenWrt builds. []
    • Fefactor and notify the #archlinux-reproducible IRC channel for problems in this distribution. []
    • Ensure that only one mail is sent when rebooting nodes. []
    • Unclutter the output of a Debian maintenance job. []
    • Drop a “todo” entry as we vary on a merged /usr for some time now. []

In addition, Paul Spooren added an OpenWrt snapshot build script which downloads .buildinfo and related checksums from the relevant download server and attempts to rebuild and then validate them for reproducibility. []

The usual node maintenance was performed by Holger Levsen [][][], Mattia Rizzolo [] and Vagrant Cascadian [][].

reprotest

reprotest is our end-user tool to build same source code twice in different environments and then check the binaries produced by each build for differences. This month, a change by Dmitry Shachnev was merged to not use the faketime wrapper at all when asked to not vary time [] and Holger Levsen subsequently released this as version 0.7.9 as dramatically overhauling the packaging [][].


Misc news & getting in touch

On our mailing list Rebecca N. Palmer started a thread titled Addresses in IPython output which points out and attempts to find a solution to a problem with Python packages, whereby objects that don’t have an explicit string representation have a default one that includes their memory address. This causes problems with reproducible builds if/when such output appears in generated documentation.

If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:



This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports