Welcome to the April 2019 report from the Reproducible Builds project! In these now-monthly reports we will outline the most important things which we have been up to in and around the world of reproducible builds & secure toolchains.
As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users pre-compiled. The motivation behind reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this month’s report, we will cover:
- Media coverage — Compromised toolchains, what makes a good digital product?, etc.
- Upstream news — Scala and Go working on reproducibility, etc.
- Distribution work — Distributing build certificates, an update from openSUSE, etc.
- Software development — New features in diffoscope, yet more test framework development, etc
- Misc news — From our mailing list, etc.
- Getting in touch — How to contribute, etc
- The SecureList website reported on Operation “ShadowHammer”, a high-profile supply chain attack involving the ASUS Live Update Utility. As their post describes in more detail tampering with binaries would usually break the digital signature, but in this case the digital signature itself appeared to have been compromised. (Read more)
Linux Weekly News (LWN) covered the recent
bootstrap-sassbackdoor incident which speaks to the impact and potential prevalence of supply-chain and mirror-based attacks. David A. Wheeler also published an essay on the incident that explicitly proposes reproducible builds as a potential way to reduce the impact of such attacks in the future.
There was an interesting discussion on Hacker News regarding the release of WAPM, a package manager for WebAssembly packages (typically embedded into browsers and web-pages). In the discussion there was a query and distinction raised by commenter whyrusleeping between the ability to reproduce any generated packages versus simply signing packages in the usual manner which received warm reception by the upstream authors.
An issue was reported against the
libsodiumcrypto library which asked for clarification why the
1.0.17release was modified on the download server. In response to this, a pull request was created by Philip Crockett to verify the project with the
minisignalgorithm instead of
Bobby Richter proposed the addition of reproducible builds as indicator of good digital products.
Anmol Sarma wrote a blog post requesting that developers “Stop Memsettings Structures”. This is relevant to the Project as
memset(3)system call is often used to ensure deterministic output of packages or of binaries themselves; if the build artifacts contain the contents uninitialised memory, to ensure a reproducible build a developer would typically “zero out” the memory using
memset(3)to ensure that it does not contain the so-called random data.
The first non-trivial library written in the Scala programming language on the Java Virtual Machine was released with Arnout Engelen’s
sbt-reproducible-builds plugin enabled during the build. This resulted in Akka 2.5.22 becoming reproducible, both for the artifacts built with version 2.12.8 and 2.13.0-RC1 of the Scala compiler. For 2.12.8, the original release was performed on a Mac and the validation was done on a Debian-based machine, so it appears the build is reproducible across diverse systems. (Mailing list thread)
Jeremiah “DTMB” Orians announced the 1.3.0 release of M2-Planet, a self-hosting C compiler written in a subset of the features it supports. It has been bootstrapped entirely from hexadecimal (!) with 100% reproducible output/binaries. This new release sports a self-hosting port for an additional architecture amongst other changes. Being “self-hosted” is an important property as it can provide a method of validating the legitimancy of the build toolchain.
The Go programming language has been making progress in making their builds reproducible. In 2016, Ximin Luo had created issue #16860 requesting that the compiler generates the same result regardless of the path in which the package is built. However, progress was recently made in Change #173344 (and adjacent) that will permit a
-trimpath mode that will generate binaries that do not contain any local path names, similar to
The fontconfig library for configuring and customising font access in a number of distributions announced they had merged patches to allow various cache files to be reproducible. This is after Chris Lamb posted a historical summary and a request for action to Fontconfig’s mailing list in January 2019
In Debian, Chris Lamb added 90 reviews of Debian packages, adding to our knowledge about identified issues and 14 issues were automatically removed. Chris also added two issue types:
Holger Levsen started a discussion regarding the distribution of
.buildinfo files. These files record the environment that was used as part of a particular build in order that — along with the source code — ensure that the aforementioned environment can be recreated at a later date to reproduce the exact binary. Distributing these files is important so that others can validate that a build is actually reproducible. In his post, Holger refers to two services that now exist, buildinfo.debian.net and buildinfos.debian.net.
In addition, Holger restarted a long-running discussion regarding the reproducibility status of Debian buster touching on questions of potentially performing mass rebuilds of all packages in order that they use updated toolchains.
There was yet more progress towards making the Debian Installer images reproducible. Following-on from last months, Chris Lamb performed some further testing of the generated images. Cyril Brulebois then made an upload of the
debian-installer package to Debian that included a number of Chris’ patches and Vagrant Cascadian filed a patch to fix the reproducibility of “u-boot” images by using
-n argument to
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. Bernhard also posted to our mailing list regarding enabling the normalisation of file modification times in Python
.pyc files and opened issue #1133809 in the openSUSE bug tracker.
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
dovecot23(Report random build failure with -j1)
rash(Report parallelism-related nondeterminism)
maven-compiler-plugin(Report copyright year / date)
python-textX(Sort Python glob)
python-autobahn(CPU-detection - consider upstreaming some variant)
python-pocketsphinx-python(upstream) (sort Python
python-py-ubjson(upstream) (sort Python
branding-openSUSE(Rediscovered already-fixed parallelism race)
python-irc(Drop file with varying
- Chris Lamb:
- Vagrant Cascadian:
diffoscope is our in-depth “diff-on-steroids” utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless diffs.
This month, Chris Lamb did a lot of development of diffoscope, including:
Add the ability to treat missing tools as failures if a “magic” environment variable is detected in order to facilitate interpreting required tools on the Debian autopkgtests as actual test failures, rather than skipping them. The behaviour of the existing testsuite remains unchanged. (#905885)
Consolidated on a single alias as the exception value across the entire codebase. […]
In addition, Vibhu Agrawal ensured that diffoscope failed more gracefully when running out of diskspace to resolve Debian bug #874582 and Vagrant Cascadian updated to diffoscope 114 in GNU Guix. Thanks!
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Chris Lamb made the following improvements:
Archive::Zip’s incorrect handling of the
localExtraFieldclass member field by monkey-patching the accessor methods to always return normalised values. This fixes the normalisation of Unix ownership metadata within
Actually check the return status from
Archive::Zipwhen writing file to disk. […]
Catch an edge-case where we can’t parse the length of a particular field within
Chris then uploaded version
1.1.3-1 to the Debian experimental distribution.
Chris Lamb made a number of improvements to our project website this month, including:
Adding as simple “lint” command so we can see how many pages are using the old style. […]
Moved various bits of infrastructure to support a monthly report structure. […]
Holger Levsen (Debian-related changes):
- Add new experimental buildinfos.debian.net service. […][…][…]
- Allow pushing of
.buildinfofiles from coccia. […]
- Permit rsync to write into subdirectories. […]
- Include the meta “pool” job in the overall job health view. […]
- Add support for host-specific SSH
authorized_keysfiles used on a particular build node. […]
- Show link to maintenance jobs for offline nodes. […][…]
- Increase the job timeout for some runners from 3 to 5 days. […]
- Don’t try to turn Jenkins or nodes offline too quickly. […][…]
- Fix pbuilder lock files if necessary. […]
- Special-case the
debian-installerpackage when building to allow it access to the internet.. […]
- Force installing the
stretchbackports and remove
- Install the
python3-yamlpackage on nodes as it is needed by the deploy script. […]
- Add/update the new
reproducible-builds.orgMX records. […][…]
- Fix typo in comment; thanks to
ijcfor reporting! […]
- Special-case the
Holger Levsen […][…][…], Mattia Rizzolo […] and Vagrant Cascadian […] all performed a large amount of build node maintenance, system & Jenkins administration and Chris Lamb provided a patch to avoid double spaces in IRC notifications […].
AJ Jordan updated
reprotest, our “end-user” tool to build arbitrary software and check it for reproducibility — to reference
--store-dir text. […]
Whilst the Reproducible Builds project intended to participate in Google Summer of Code and Outreachy in 2019 we sadly did not find any suitable students. We do plan to be involved in future rounds wherever possible.
Chris Lamb noticed that the SUSv3/POSIX UNIX specification mentions that for portability-reasons the character string that identifies the timezone description should begin with a colon character which may have future implications regarding ensuring a particular timezone to ensure a reproducible build.
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Arnout Engelen, Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Mattia Rizzolo and Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.