Reproducible Builds in July 2019

View all our monthly reports


Welcome to the July 2019 report from the Reproducible Builds project!

In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

In July’s report, we cover:

  • Front pageMedia coverage, upstream news, etc.
  • Distribution workShenanigans at DebConf19
  • Software developmentSoftware transparency, yet more diffoscope work, etc.
  • On our mailing listGNU tools, education and buildinfo files
  • Getting in touch… and how to contribute

If you are interested in contributing to our project, we enthusiastically invite you to visit our Contribute page on our website.


Front page

Nico Alt wrote a detailed and well-researched article titled “Trust is good, control is better” which discusses Reproducible builds in F-Droid the alternative application repository for Android mobile phones. In contrast to the bigger commercial app stores F-Droid only offers apps that are free and open source software. The post not only demonstrates using diffoscope but talks more generally about how reproducible builds can prevent single developers or other important centralised infrastructure becoming targets for toolchain-based attacks.

Later in the month, F-Droid’s aforementioned reproducibility status was mentioned on episode 68 of the Late Night Linux podcast. (direct link to 14:12)

Morten (“Foxboron”) Linderud published his academic thesis “Reproducible Builds: break a log, good things come in trees” which investigates and describes how transparency log overlays can provide additional security guarantees for computers automatically producing software packages. The thesis was part of Morten’s studies at the University of Bergen, Norway and is an extension of the work New York University Tandon School of Engineering has been doing with package rebuilder integration in APT.

Mike Hommey posted to his blog about Reproducing the Linux builds of Firefox 68 which leverages that builds shipped by Mozilla should be reproducible from this version. He discusses the problems caused by the builds being optimised with Profile-Guided Optimisation (PGO) but armed with the now-published profiling data, Mike provides Docker-based instructions how to reproduce the published builds yourself.

Joel Galenson has been making progress on a reproducible Rust compiler which includes support for a --remap-path-prefix argument, related to the concepts and problems involved in the BUILD_PATH_PREFIX_MAP proposal to fix issues with build paths being embedded in binaries.

Lastly, Alessio Treglia posted to their blog about Cosmos Hub and Reproducible Builds which describes the reproducibility work happening in the Cosmos Hub, a network of interconnected blockchains. Specifically, Alessio talks about work being done on the Gaia development kit for the Hub.



Distribution work

Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution where enabling. Enabling Link Time Optimization (LTO) in this distribution’s “Tumbleweed” branch caused multiple issues due to the number of cores on the build host being added to the CFLAGS variable. This affected, for example, a debuginfo/rpm header as well as resulted in in CFLAGS appearing in built binaries such as fldigi, gmp, haproxy, etc.

As highlighted in last month’s report, the OpenWrt project (a Linux operating system targeting embedded devices such as wireless network routers) hosted a summit in Hamburg, Germany. Their full summit report and roundup is now available that covers many general aspects within that distribution, including the work on reproducible builds that was done during the event.

Debian

It was an extremely productive time in Debian this month in and around DebConf19, the 20th annual conference for both contributors and users and was held at the Federal University of Technology in Paraná (UTFPR) in Curitiba, Brazil, from July 21 to 28. The conference was preceded by “DebCamp” from the 14th until the 19th with an additional “Open Day” that is targeted at the more-general public on the 20th.

There were a number of talks touching on the topic of reproducible builds and secure toolchains throughout the conference, including:

There were naturally countless discussions regarding Reproducible Builds in and around the conference on the questions of tooling, infrastructure and our next steps as a project.

The release of Debian 10 buster has also meant the release cycle for the next release (codenamed “bullseye”) has just begun. As part of this, the Release Team recently announced that Debian will no longer allow binaries built and uploaded by maintainers on their own machines to be part of the upcoming release. This is great news not only for toolchain security in general but also in that it will ensure that all binaries that will form part of this release will likely have a .buildinfo file and thus metadata that could be used by others to reproduce and verify the builds.

Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that .buildinfo metadata files were not being automatically propagated if packages had to be manually approved or processed in the so-called “NEW queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a potential patch for the issue which is pending review.

David Bremner posted to his blog post about “Yet another buildinfo database” that provides a SQL interface for querying .buildinfo attestation documents, particularly focusing on identifying packages that were built with a specific — and possibly buggy — build-dependency. Later at DebConf, David demonstrated his tool live (starting at 36:30).

Ivo de Decker (“ivodd”) scheduled rebuilds of over 600 packages that last experienced an upload to the archive in December 2016 or earlier. This was so that they would be built using a version of the low-level dpkg package build tool that supports the generation of reproducible binary packages. The effect of this on the main archive will be deliberately staggered and thus visible throughout the upcoming weeks, potentially resulting in some of these packages now failing to build.

Joaquin de Andres posted an update regarding the work being done on continuous integration on Debian’s GitLab instance at DebConf19 in which he mentions, inter alia, a tool called atomic-reprotest. This is a relatively new utility to help debug failures logged by our reprotest tool which attempts to test whether a build is reproducible or not. This tool was also mentioned in a subsequent lightning talk.

Chris Lamb filed two bugs to drop the test jobs for both strip-nondeterminism (#932366) and reprotest (#932374) after modifying them to build on the Salsa server’s own continuous integration platform and Holger Levsen shortly resolved them.

Lastly, 63 reviews of Debian packages were added, 72 were updated and 22 were removed this month, adding to our large knowledge about identified issues. Chris Lamb added and categorised four new issue types, umask_in_java_jar_file, built_by-in_java_manifest_mf, timestamps_in_manpages_generated_by_lopsubgen and codadef_coda_data_files.


Software development

The goal of Benjamin Hof’s Software Transparency effort is to improve on the cryptographic signatures of the APT package manager by introducing a Merkle tree-based transparency log for package metadata and source code, in a similar vein to certificate transparency. This month, he pushed a number of repositories to our revision control system for further future development and review.

In addition, Bernhard M. Wiedemann updated his (deliberately) unreproducible demonstration project to add support for floating point variations as well as changes in the project’s copyright year.

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Neal Gompa, Michael Schröder & Miro Hrončok responded to Fedora’s recent change to rpm-config with some new developments within rpm to fix an unreproducible “Build Date” and reverted a change to the Python interpreter to switch back to unreproducible/time-based compile caches.

Lastly, kpcyrd submitted a pull request for Alpine Linux to add SOURCE_DATE_EPOCH support to the abuild build tool in this operating system.


diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

This month, Chris Lamb made the following changes:

  • Add support for Java .jmod modules (#60). However, not all versions of file(1) support detection of these files yet, so we perform a manual comparison instead [].
  • If a command fails to execute but does not print anything to standard error, try and include the first line of standard output in the message we include in the difference. This was motivated by readelf(1) returning its error messages on standard output. [#59) []
  • Add general support for file(1) 5.37 (#57) but also adjust the code to not fail in tests when, eg, we do not have sufficiently newer or older version of file(1) (#931881).
  • Factor out the ability to ignore the exit codes of zipinfo and zipinfo -v in the presence of non-standard headers. [] but only override the exit code from our special-cased calls to zipinfo(1) if they are 1 or 2 to avoid potentially masking real errors [].
  • Cease ignoring test failures in stable-backports. []
  • Add missing textual DESCRIPTION headers for .zip and “Mozilla”-optimised .zip files. []
  • Merge two overlapping environment variables into a single DIFFOSCOPE_FAIL_TESTS_ON_MISSING_TOOLS. []
  • Update some reporting:
    • Re-add “return code” noun to “Command foo exited with X” error messages. []
    • Use repr(..)-style output when printing DIFFOSCOPE_TESTS_FAIL_ON_MISSING_TOOLS in skipped test rationale text. []
    • Skip the extra newline in Output:\nfoo. []
  • Add some explicit return values to appease Pylint, etc. []
  • Also include the python3-tlsh in the Debian test dependencies. []
  • Released and uploaded releasing versions 116, 117, 118, 119 & 120. [][][][][]

In addition, Marc Herbert provided a patch to catch failures to disassemble ELF binaries. []


Project website

There was a yet more effort put into our our website this month, including:

  • Bernhard M. Wiedemann:
    • Update multiple works to use standard (or at least consistent) terminology. []
    • Document an alternative Python snippet in the SOURCE_DATE_EPOCH examples examples. []
  • Chris Lamb:
    • Split out our non-fiscal sponsors with a description [] and make them non-display three-in-a-row [].
    • Correct references to 1&1 IONOS (née Profitbricks). []
    • Reduce ambiguity in our environment names. []
    • Recreate the badge image, saving the .svg alongside it. []
    • Update our fiscal sponsors. [][][]
    • Tidy the weekly reports section on the news page [], fixup the typography on the documentation page [] and make all headlines stand out a bit more [].
    • Drop some old CSS files and fonts. []
    • Tidy news page a bit. []
    • Fixup a number of issues in the report template and previous reports. [][][][][][]

Holger Levsen also added explanations on how to install diffoscope on OpenBSD [] and FreeBSD [] to its homepage and Arnout Engelen added a preliminary and work-in-progress idea for a badge or “shield” program for upstream projects. [][][].

A special thank you to Alexander Borkowski [] Georg Faerber [], and John Scott [] for their individual fixes. To err is human; to reproduce, divine.


strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Niko Tyni provided a patch to use the Perl Sub::Override library for some temporary workarounds for issues in Archive::Zip instead of Monkey::Patch which was due for deprecation. [].

In addition, Chris Lamb made the following changes:

  • Identify data files from the COmmon Data Access (CODA) framework as being .zip files. []
  • Support OpenJDK “.jmod” files. []
  • Pass --no-sandbox if necessary to bypass seccomp-enabled version of file(1) which was causing a huge number of regressions in our testing framework.
  • Don’t just run the tests but build the Debian package instead using Salsa’s centralised scripts so that we get code coverage, Lintian, autopkgtests, etc. [][]
  • Update tests:
    • Don’t build release Git tags on salsa.debian.org. []
    • Merge the debian branch into the master branch to simplify testing and deployment [] and update debian/gbp.conf to match [].
  • Drop misleading and outdated MANIFEST and MANIFEST.SKIP files as they are not used by our release process. []


Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. The following changes were performed in the last month:

  • Holger Levsen:
    • Debian-specific changes:
      • Make a large number of adjustments to support the new Debian bullseye distribution and the release of buster. [][][][][][][] [][][][]
      • Fix the colours for the five suites now being built. []
      • Make a number code improvements to the calculation of our “metapackage” sets including refactoring and changes of email address, etc. [][][][][]
      • Add the “http-proxy” variable to the displayed node info. []
    • Alpine changes:
      • Rebuild the webpages every two hours (instead of twice per hour). []
    • Reproducible tooling:
      • Fix the detection of version number in Arch Linux. []
      • Drop reprotest and strip-nondeterminism jobs as we run that via Salsa CI now. [][]
      • Add a link to current SQL database schema. []
  • Mattia Rizzolo:
    • Make a number of adjustments to support the new Debian bullseye distribution. [][][][]
    • Ensure that our arm64 hosts always trust the Debian archive keyring. []
    • Enable the backports repositories on the arm64 build hosts. []

Holger Levsen [][][] and Mattia Rizzolo [][][] performed the usual node maintenance and lastly, Vagrant Cascadian added support to generate a reproducible-tracker.json metadata file for the next release of Debian (bullseye). []


On the mailing list

Chris Lamb cross-posted his reply to the “Re: file(1) now with seccomp support enabled thread that was originally started on the debian-devel Debian list. In his post, he refers to a strip-nondeterminism not being able to accommodate the additional security hardening in file(1) and the changes made to the tool in order to do fix this issue which was causing a huge number of regressions in our testing framework.

Matt Bearup wrote about his experience when he generated different checksums for the libgcrypt20 package which resulted in some pointers etc. in that one should be using the equivalent .buildinfo post-build certificate when attempting to reproduce any particular build.

Vagrant Cascadian posted a request for comments regarding a potential proposal to the GNU Tools “Cauldron” gathering to be held in Montréal, Canada during September 2019 and Bernhard M. Wiedemann posed a query about using consistent terms on our webpages and elsewhere.

Lastly, in a thread titled “Reproducible Builds - aiming for bullseye: comments and a purpose” Jathan asked about whether we had considered offering “101”-like beginner sessions to fix packages that are not currently reproducible.



Getting in touch

If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month’s report was written by Benjamin Hof, Bernhard M. Wiedemann, Chris Lamb, Holger Levsen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports