Welcome to the July 2019 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up over the past month. As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.
The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In July’s report, we cover:
- Front page — Media coverage, upstream news, etc.
- Distribution work — Shenanigans at DebConf19
- Software development — Software transparency, yet more diffoscope work, etc.
- On our mailing list — GNU tools, education and buildinfo files
- Getting in touch — … and how to contribute
If you are interested in contributing to our project, we enthusiastically invite you to visit our Contribute page on our website.
Nico Alt wrote a detailed and well-researched article titled “Trust is good, control is better” which discusses Reproducible builds in F-Droid the alternative application repository for Android mobile phones. In contrast to the bigger commercial app stores F-Droid only offers apps that are free and open source software. The post not only demonstrates using diffoscope but talks more generally about how reproducible builds can prevent single developers or other important centralised infrastructure becoming targets for toolchain-based attacks.
Morten (“Foxboron”) Linderud published his academic thesis “Reproducible Builds: break a log, good things come in trees” which investigates and describes how transparency log overlays can provide additional security guarantees for computers automatically producing software packages. The thesis was part of Morten’s studies at the University of Bergen, Norway and is an extension of the work New York University Tandon School of Engineering has been doing with package rebuilder integration in APT.
Mike Hommey posted to his blog about Reproducing the Linux builds of Firefox 68 which leverages that builds shipped by Mozilla should be reproducible from this version. He discusses the problems caused by the builds being optimised with Profile-Guided Optimisation (PGO) but armed with the now-published profiling data, Mike provides Docker-based instructions how to reproduce the published builds yourself.
Joel Galenson has been making progress on a reproducible Rust compiler which includes support for a
--remap-path-prefix argument, related to the concepts and problems involved in the
BUILD_PATH_PREFIX_MAP proposal to fix issues with build paths being embedded in binaries.
Lastly, Alessio Treglia posted to their blog about Cosmos Hub and Reproducible Builds which describes the reproducibility work happening in the Cosmos Hub, a network of interconnected blockchains. Specifically, Alessio talks about work being done on the Gaia development kit for the Hub.
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution where enabling. Enabling Link Time Optimization (LTO) in this distribution’s “Tumbleweed” branch caused multiple issues due to the number of cores on the build host being added to the
CFLAGS variable. This affected, for example, a
debuginfo/rpm header as well as resulted in in
CFLAGS appearing in built binaries such as
As highlighted in last month’s report, the OpenWrt project (a Linux operating system targeting embedded devices such as wireless network routers) hosted a summit in Hamburg, Germany. Their full summit report and roundup is now available that covers many general aspects within that distribution, including the work on reproducible builds that was done during the event.
It was an extremely productive time in Debian this month in and around DebConf19, the 20th annual conference for both contributors and users and was held at the Federal University of Technology in Paraná (UTFPR) in Curitiba, Brazil, from July 21 to 28. The conference was preceded by “DebCamp” from the 14th until the 19th with an additional “Open Day” that is targeted at the more-general public on the 20th.
There were a number of talks touching on the topic of reproducible builds and secure toolchains throughout the conference, including:
- Reproducible Builds - aiming for bullseye by Holger Levsen, Chris Lamb and Vagrant Cascadian.
- Software transparency: improving package manager security presented by Benjamin Hof.
- Software transparency BoF, an informal “birds of a feather” session to discuss and collect ideas around detecting compromised archives.
There were naturally countless discussions regarding Reproducible Builds in and around the conference on the questions of tooling, infrastructure and our next steps as a project.
The release of Debian 10 buster has also meant the release cycle for the next release (codenamed “bullseye”) has just begun. As part of this, the Release Team recently announced that Debian will no longer allow binaries built and uploaded by maintainers on their own machines to be part of the upcoming release. This is great news not only for toolchain security in general but also in that it will ensure that all binaries that will form part of this release will likely have a
.buildinfo file and thus metadata that could be used by others to reproduce and verify the builds.
Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that
.buildinfo metadata files were not being automatically propagated if packages had to be manually approved or processed in the so-called “
NEW queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a potential patch for the issue which is pending review.
David Bremner posted to his blog post about “Yet another buildinfo database” that provides a SQL interface for querying
.buildinfo attestation documents, particularly focusing on identifying packages that were built with a specific — and possibly buggy — build-dependency. Later at DebConf, David demonstrated his tool live (starting at 36:30).
Ivo de Decker (“ivodd”) scheduled rebuilds of over 600 packages that last experienced an upload to the archive in December 2016 or earlier. This was so that they would be built using a version of the low-level
dpkg package build tool that supports the generation of reproducible binary packages. The effect of this on the main archive will be deliberately staggered and thus visible throughout the upcoming weeks, potentially resulting in some of these packages now failing to build.
Joaquin de Andres posted an update regarding the work being done on continuous integration on Debian’s GitLab instance at DebConf19 in which he mentions, inter alia, a tool called
atomic-reprotest. This is a relatively new utility to help debug failures logged by our
reprotest tool which attempts to test whether a build is reproducible or not. This tool was also mentioned in a subsequent lightning talk.
Chris Lamb filed two bugs to drop the test jobs for both
strip-nondeterminism (#932366) and
reprotest (#932374) after modifying them to build on the Salsa server’s own continuous integration platform and Holger Levsen shortly resolved them.
Lastly, 63 reviews of Debian packages were added, 72 were updated and 22 were removed this month, adding to our large knowledge about identified issues. Chris Lamb added and categorised four new issue types,
The goal of Benjamin Hof’s Software Transparency effort is to improve on the cryptographic signatures of the APT package manager by introducing a Merkle tree-based transparency log for package metadata and source code, in a similar vein to certificate transparency. This month, he pushed a number of repositories to our revision control system for further future development and review.
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
HSAIL-Tools— Sort hash, submitted upstream.
MozillaFirefox,MozillaThunderbird — Fix a race condition.
zipshell call and
boswars— Sort Python
readdir, submitted upstream.
.pngdate & time, filed upstream.
calibre— Sort Python
colobot-data— Sort Python
filtlan— Fix LaTeX run, dropping an unreproducible log with date.
.pngdate & time.
gcc— Report Link-time optimisation-induced nondeterminism caused by using global constructors.
griefly— Sort Python
herbstluftwm— Use CMake
kitty— Sort filesystem.
.pngdate & time.
maven-javadoc-plugin— Report copyright using the current year.
mono— Report unreproducible
netpanzer— Sort SCons
.pngdate & time.
gzip -nin Debian package build.
open-iscsi— Fix nondeterministic
perl-File-Unpack— Fix a parallelism-induced race condition, submitted upstream.
python-futurist— Drop Python
python-nautilus— Python date, already filed upstream.
python-pyreadstat— Sort Python readdir.
python-scikit-image— Drop randomness from package.
python-slycot— Drop unreproducible
python-statsmodels— Drop unreproducible
trilinos— Sort a Perl
vdrift— Fix a date exposed by in SCons and Python.
wordwarvi— Adjust a date, already filed upstream.
worldofpadman— Fix a date, merged upstream.
yadex— Fix file modification times.
znc— Avoid parallelism race from
- Chris Lamb:
Neal Gompa, Michael Schröder & Miro Hrončok responded to Fedora’s recent change to
rpm-config with some new developments within rpm to fix an unreproducible “
Build Date” and reverted a change to the Python interpreter to switch back to unreproducible/time-based compile caches.
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb made the following changes:
- Add support for Java
.jmodmodules (#60). However, not all versions of
file(1)support detection of these files yet, so we perform a manual comparison instead […].
- If a command fails to execute but does not print anything to standard error, try and include the first line of standard output in the message we include in the difference. This was motivated by
readelf(1)returning its error messages on standard output. [#59) […]
- Add general support for
file(1)5.37 (#57) but also adjust the code to not fail in tests when, eg, we do not have sufficiently newer or older version of
- Factor out the ability to ignore the exit codes of
zipinfo -vin the presence of non-standard headers. […] but only override the exit code from our special-cased calls to
zipinfo(1)if they are
2to avoid potentially masking real errors […].
- Cease ignoring test failures in
- Add missing textual
- Merge two overlapping environment variables into a single
- Update some reporting:
- Add some explicit return values to appease Pylint, etc. […]
- Also include the
python3-tlshin the Debian test dependencies. […]
- Released and uploaded releasing versions 116, 117, 118, 119 & 120. […][…][…][…][…]
There was a yet more effort put into our our website this month, including:
- Bernhard M. Wiedemann:
- Chris Lamb:
- Split out our non-fiscal sponsors with a description […] and make them non-display three-in-a-row […].
- Correct references to 1&1 IONOS (née Profitbricks). […]
- Reduce ambiguity in our environment names. […]
- Recreate the badge image, saving the
.svgalongside it. […]
- Update our fiscal sponsors. […][…][…]
- Tidy the weekly reports section on the news page […], fixup the typography on the documentation page […] and make all headlines stand out a bit more […].
- Drop some old CSS files and fonts. […]
- Tidy news page a bit. […]
- Fixup a number of issues in the report template and previous reports. […][…][…][…][…][…]
Holger Levsen also added explanations on how to install diffoscope on OpenBSD […] and FreeBSD […] to its homepage and Arnout Engelen added a preliminary and work-in-progress idea for a badge or “shield” program for upstream projects. […][…][…].
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Niko Tyni provided a patch to use the Perl
Sub::Override library for some temporary workarounds for issues in
Archive::Zip instead of
Monkey::Patch which was due for deprecation. […].
In addition, Chris Lamb made the following changes:
- Identify data files from the COmmon Data Access (CODA) framework as being
- Support OpenJDK “.jmod” files. […]
--no-sandboxif necessary to bypass seccomp-enabled version of
file(1)which was causing a huge number of regressions in our testing framework.
- Don’t just run the tests but build the Debian package instead using Salsa’s centralised scripts so that we get code coverage, Lintian, autopkgtests, etc. […][…]
- Update tests:
- Drop misleading and outdated
MANIFEST.SKIPfiles as they are not used by our release process. […]
- Holger Levsen:
- Debian-specific changes:
- Make a large number of adjustments to support the new Debian bullseye distribution and the release of buster. […][…][…][…][…][…][…] […][…][…][…]
- Fix the colours for the five suites now being built. […]
- Make a number code improvements to the calculation of our “metapackage” sets including refactoring and changes of email address, etc. […][…][…][…][…]
- Add the “
http-proxy” variable to the displayed node info. […]
- Alpine changes:
- Rebuild the webpages every two hours (instead of twice per hour). […]
- Reproducible tooling:
- Debian-specific changes:
- Mattia Rizzolo:
Holger Levsen […][…][…] and Mattia Rizzolo […][…][…] performed the usual node maintenance and lastly, Vagrant Cascadian added support to generate a
reproducible-tracker.json metadata file for the next release of Debian (bullseye). […]
On the mailing list
Chris Lamb cross-posted his reply to the “Re: file(1) now with seccomp support enabled thread that was originally started on the
debian-devel Debian list. In his post, he refers to a strip-nondeterminism not being able to accommodate the additional security hardening in
file(1) and the changes made to the tool in order to do fix this issue which was causing a huge number of regressions in our testing framework.
Matt Bearup wrote about his experience when he generated different checksums for the
libgcrypt20 package which resulted in some pointers etc. in that one should be using the equivalent
.buildinfo post-build certificate when attempting to reproduce any particular build.
Vagrant Cascadian posted a request for comments regarding a potential proposal to the GNU Tools “Cauldron” gathering to be held in Montréal, Canada during September 2019 and Bernhard M. Wiedemann posed a query about using consistent terms on our webpages and elsewhere.
Lastly, in a thread titled “Reproducible Builds - aiming for bullseye: comments and a purpose” Jathan asked about whether we had considered offering “101”-like beginner sessions to fix packages that are not currently reproducible.
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Benjamin Hof, Bernhard M. Wiedemann, Chris Lamb, Holger Levsen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.