Reproducible Builds in February 2020

View all our monthly reports


Welcome to the February 2020 report from the Reproducible Builds project.

One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes.

The motivation behind the reproducible builds effort is to provide the ability to demonstrate these binaries originated from a particular, trusted, source release: if identical results are generated from a given source in all circumstances, reproducible builds provides the means for multiple third-parties to reach a consensus on whether a build was compromised via distributed checksum validation or some other scheme.

In this month’s report, we cover:

  • Media coverage & upstream newsA new paper on reproducible containers, Ruby updates, etc.
  • Distribution workMore work in Debian, openSUSE & friends.
  • Software developmentUpdates and improvements to our tooling.
  • Getting in touchHow to contribute, more venues for discussion.

If you are interested in contributing to the project, please visit our Contribute page on our website.


Media coverage & upstream news

Omar Navarro Leija, a PhD student at the University Of Pennsylvania, published a paper entitled Reproducible Containers that describes in detail the workings of a new user-space container tool called DetTrace:

All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows.

There was also considerable discussion on our mailing list regarding this research and a presentation based on the paper will occur at the ASPLOS 2020 conference between March 16th — 20th in Lausanne, Switzerland.

The many virtues of Reproducible Builds were touted as benefits for software compliance in a talk at FOSDEM 2020, debating whether the Careful Inventory of Licensing Bill of Materials Have Impact of FOSS License Compliance which pitted Jeff McAffer and Carol Smith against Bradley Kuhn and Max Sills. (~47 minutes in).

Nobuyoshi Nakada updated the canonical implementation of the Ruby programming language a change such that filesystem globs (ie. calls to list the contents of filesystem directories) will henceforth be sorted in ascending order. Without this change, the underlying nondeterministic ordering of the filesystem is exposed to the language which often results in an unreproducible build.

Vagrant Cascadian reported on our mailing list regarding a quick reproducible test for the GNU Guix distribution, which resulted in 81.9% of packages registering as reproducible in his installation:

$ guix challenge --verbose --diff=diffoscope ...
2,463 store items were analyzed:
  - 2,016 (81.9%) were identical
  - 37 (1.5%) differed
  - 410 (16.6%) were inconclusive

Jeremiah Orians announced on our mailing list the release of a number of tools related to cross-compilation such as M2-Planet and mescc-tools-seed. This project attemps a full bootstrap of a cross-platform compiler for the C programming language (written in C itself) from hex, the ultimate goal being able to demonstrate fully-bootstrapped compiler from hex to the GCC GNU Compiler Collection. This has many implications in and around Ken Thompson’s Trusting Trust attack outlined in Thompson’s 1983 Turing Award Lecture.

Twitter user @TheYoctoJester posted an executive summary of reproducible builds in the Yocto Project:

Finally, Reddit user tofflos posted to the /r/Java subreddit asking about how to achieve reproducible builds with Maven and Chris Lamb noticed that the Linux kernel documentation about reproducible builds of it is available on the kernel.org homepages in an attractive HTML format.


Distribution work

Debian

Chris Lamb created a merge request for the core debian-installer package to allow all arguments and options from sources.list files (such as “[check-valid-until=no]”, etc.) in order that we can test the reproducibility of the installer images on the Reproducible Builds own testing infrastructure. (#13)

Thorsten Glaser followed-up to a bug filed against the dpkg-source component that was originally filed in late 2015 that claims that the build tool does not respect permissions when unpacking tarballs if the umask is set to 0002.

Matthew Garrett posted to the debian-devel mailing list on the topic of “Producing verifiable initramfs images” as part of a wider conversation on being able to trust the entire software stack on our computers.

59 reviews of Debian packages were added, 30 were updated and 42 were removed this month adding to our knowledge about identified issues. Many issue types were noticed and categorised by Chris Lamb, including:

openSUSE

In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as provided the following patches:


Software development

diffoscope

diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour.

Chris Lamb made the following changes this month, including uploading version 137 to Debian:

  • The sng image utility appears to return with an exit code of 1 if there are even minor errors in the file. (#950806)
  • Also extract classes2.dex, classes3.dex from .apk files extracted by apktool. (#88)
  • No need to use str.format if we are just returning the string. []
  • Add generalised support for “ignoring” returncodes [] and move special-casing of returncodes in zip to use Command.VALID_RETURNCODES. []

Other tools

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Vagrant Cascadian updated the Vcs-Git to specify the debian packaging branch. []

reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.13 and 0.7.14 were uploaded to Debian unstable by Holger Levsen after Vagrant Cascadian added support for GNU Guix [].

Project documentation & website

There was more work performed on our documentation and website this month. Bernhard M. Wiedemann added a Java Gradle Build Tool snippet to the SOURCE_DATE_EPOCH documentation [] and normalised various terms to “unreproducible” [].

Chris Lamb added a Meson.build example [] and improved the documentation for the CMake [] to the SOURCE_DATE_EPOCH documentation, replaced “anyone can” with “anyone may” as, well, not everyone has the resources, skills, time or funding to actually do what it refers to [] and improved the pre-processing for our report generation [][][][] etc.

In addition, Holger Levsen updated our news page to improve the list of reports [], added an explicit mention of the weekly news time span [] and reverted sorting of news entries to have latest on top [] and Mattia Rizzolo added Codethink as a non-fiscal sponsor [] and lastly Tianon Gravi added a Docker Images link underneath the “Debian” project on our “Projects” page [].

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Vagrant Cascadian submitted patches via the Debian bug tracking system targeting the packages the Civil Infrastructure Platform has identified via the “CIP” and “CIP build depends” package sets:

Testing framework

We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made by Holger Levsen:

In addition, Mattia Rizzolo added an Apache web server redirect for buildinfos.debian.net [] and reverted the reshuffling of arm64 architecture builders []. The usual build node maintenance was performed by Holger Levsen, Mattia Rizzolo [][] and Vagrant Cascadian.


Getting in touch

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:



This month’s report was written by Bernhard M. Wiedemann, Chris Lamb and Holger Levsen. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports