Reproducible Builds in May 2024

View all our monthly reports


Welcome to the May 2024 report from the Reproducible Builds project! In these reports, we try to outline what we have been up to over the past month and highlight news items in software supply-chain security more broadly. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Table of contents:

  1. A peek into build provenance for Homebrew
  2. Distribution news
  3. Mailing list news
  4. Miscellaneous news
  5. Two new academic papers
  6. diffoscope
  7. Website updates
  8. Upstream patches
  9. Reproducibility testing framework


A peek into build provenance for Homebrew

Joe Sweeney and William Woodruff on the Trail of Bits blog wrote an extensive post about build provenance for Homebrew, the third-party package manager for MacOS. Their post details how each “bottle” (i.e. each release):

[…] built by Homebrew will come with a cryptographically verifiable statement binding the bottle’s content to the specific workflow and other build-time metadata that produced it. […] In effect, this injects greater transparency into the Homebrew build process, and diminishes the threat posed by a compromised or malicious insider by making it impossible to trick ordinary users into installing non-CI-built bottles.

The post also briefly touches on future work, including work on source provenance:

Homebrew’s formulae already hash-pin their source artifacts, but we can go a step further and additionally assert that source artifacts are produced by the repository (or other signing identity) that’s latent in their URL or otherwise embedded into the formula specification.


Distribution news

In Debian this month, Johannes Schauer Marin Rodrigues (aka josch) noticed that the Debian binary package bash version 5.2.15-2+b3 was “uploaded to the archive twice. Once to bookworm and once to sid but with differing content.” This is problem for reproducible builds in Debian due its assumption that the package name, version and architecture triplet is unique. However, josch highlighted that

This example with bash is especially problematic since bash is Essential:yes, so there will now be a large portion of .buildinfo files where it is not possible to figure out with which of the two differing bash packages the sources were compiled.

In response to this, Holger Levsen performed an analysis of all .buildinfo files and found that this needs almost 1,500 binNMUs to fix the fallout from this bug.

Elsewhere in Debian, Vagrant Cascadian posted about a Non-Maintainer Upload (NMU) sprint to take place during early June, and it was announced that there is now a #debian-snapshot IRC channel on OFTC to discuss the creation of a new source code archiving service to, perhaps, replace snapshot.debian.org. Lastly, 11 reviews of Debian packages were added, 15 were updated and 48 were removed this month adding to our extensive knowledge about identified issues. A number of issue types have been updated by Chris Lamb as well. [][]


Elsewhere in the world of distributions, deep within a larger announcement from Colin Percival about the release of version 14.1-BETA2, it was mentioned that the FreeBSD kernels are now built reproducibly.


In Fedora, however, the change proposal mentioned in our report for April 2024 was approved, so, per the ReproduciblePackageBuilds wiki page, the add-determinism tool is now running in new builds for Fedora 41 (‘rawhide’). The add-determinism tool is a Rust program which, as its name suggests, adds determinism to files that are given as input by “attempting to standardize metadata contained in binary or source files to ensure consistency and clamping to $SOURCE_DATE_EPOCH in all instances”. This is essentially the Fedora version of Debian’s strip-nondeterminism. However, strip-nondeterminism is written in Perl, and Fedora did not want to pull Perl in the buildroot for every package. The add-determinism tool eliminates many causes of non-determinism and work is ongoing to continue the scope of packages it can operate on.


Mailing list news

On our mailing list this month, regular contributor kpcyrd wrote to the list with an update on their source code indexing project, whatsrc.org. The whatsrc.org project, which was launched last month in response to the XZ Utils backdoor, now contains and indexes almost 250,000 unique source code archives. In their post, kpcyrd gives an example of its intended purpose, noting that it shown that whilst “there seems to be consensus about [the] source code for zsh 5.9” in various Linux distributions, it “does not align with the contents of the zsh Git repository”.

Holger Levsen also posted to the list with a ‘pre-announcement’ of sorts for the 2024 Reproducible Builds summit. In particular:

[Whilst] the dates and location are not fixed yet, however if you don’ help us with finding a suitable location soon, it is very likely that we’ll meet again in Hamburg in the 2nd half of September 2024 […].

Lastly, Frederic-Emmanuel Picca wrote to the list asking for help understanding the “non-reproducible status of the Debian silx package” and received replies from both Vagrant Cascadian and Chris Lamb.


Miscellaneous news

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month strip-nondeterminism version 1.14.0-1 was uploaded to Debian unstable by Chris Lamb chiefly to incorporate a change from Alex Muntada to avoid a dependency on Sub::Override to perform monkey-patching and break circular dependencies related to debhelper []. Elsewhere in our tooling, Jelle van der Waa modified reprotest because the pipes module will be removed in Python version 3.13 [].


It was also noticed that a new blog post by Daniel Stenberg detailing “How to verify a Curl release” mentions the SOURCE_DATE_EPOCH environment variable. This is because:

The [curl] release tools document also contains another key component: the exact time stamp at which the release was done – using integer second resolution. In order to generate a correct tarball clone, you need to also generate the new version using the old version’s timestamp. Because the modification date of all files in the produced tarball will be set to this timestamp.


Furthermore, Fay Stegerman filed a bug against the Signal messenger app for Android to report that their ‘reproducible’ builds cannot, in fact, be reproduced. However, Fay is quick to note that she has:

… found zero evidence of any kind of compromise. Some differences are yet unexplained but everything I found seems to be benign. I am disappointed that Reproducible Builds have been broken for months but I have zero reason to doubt Signal’s security in any way.


Lastly, it was observed that there was a concise and diagrammatic overview of “supply chain threats” on the SLSA website.


Two new academic papers

Two new scholarly papers were published this month.

Firstly, Mathieu Acher, Benoît Combemale, Georges Aaron Randrianaina and Jean-Marc Jézéquel of University of Rennes on Embracing Deep Variability For Reproducibility & Replicability. The authors describe their approach as follows:

In this short [vision] paper we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (e.g., language, libraries, compiler, virtual machine, OS, environment variables, etc.). The primary objectives are: i) gaining insights into the variability layers and their possible interactions, ii) capturing and documenting configurations for the sake of reproducibility, and iii) exploring diverse configurations to replicate, and hence validate and ensure the robustness of results. By adopting these methodologies, we aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical aspects.

(A PDF of this article is available.)


Secondly, Ludovic Courtès, Timothy Sample, Simon Tournier and Stefano Zacchiroli have collaborated to publish a paper on Source Code Archiving to the Rescue of Reproducible Deployment. Their paper was motivated because:

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.

(A PDF of this article is also available.)


diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 266, 267, 268 and 269 to Debian, making the following changes:

  • New features:

    • Use xz --list to supplement output when comparing .xz archives; essential when metadata differs. (#1069329)
    • Include xz --verbose --verbose (ie. double) output. (#1069329)
    • Strip the first line from the xz --list output. []
    • Only include xz --list --verbose output if the xz has no other differences. []
    • Actually append the xz --list after the container differences, as it simplifies a lot. []
  • Testing improvements:

    • Allow Debian testing to fail right now. []
    • Drop apktool from Build-Depends; we can still test APK functionality via autopkgtests. (#1071410)
    • Add a versioned dependency for at least version 5.4.5 for the xz tests as they fail under (at least) version 5.2.8. (#374)
    • Fix tests for 7zip 24.05. [][]
    • Fix all tests after additon of xz --list. [][]
  • Misc:

    • Update copyright years. []

In addition, James Addison fixed an issue where the HTML output showed only the first difference in a file, while the text output shows all differences [][][], Sergei Trofimovich amended the 7zip version test for older 7z versions that include the string “[64]“ [][] and Vagrant Cascadian relaxed the versioned dependency to allow version 5.4.1 for the xz tests [] and proposed updates to guix for versions 267, 268 and pushed version 269 to Guix. Furthermore, Eli Schwartz updated the diffoscope.org website in order to explain how to install diffoscope on Gentoo [].


Website updates

There were a number of improvements made to our website this month, including Chris Lamb making the “print” CSS stylesheet nicer []. Fay Stegerman made a number of updates to the page about the SOURCE_DATE_EPOCH environment variable [][][] and Holger Levsen added some of their presentations to the “Resources” page. Furthermore, IOhannes zmölnig stipulated support for SOURCE_DATE_EPOCH in clang version 16.0.0+ [], Jan Zerebecki expanded the “Formal definition” page and fixed a number of typos on the “Buy-in” page [] and Simon Josefsson fixed the link to Trisquel GNU/Linux on the “Projects” page [].


Upstream patches

This month, we wrote a number of patches to fix specific reproducibility issues, including:


Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:

  • Debian-related changes:

    • Enable the rebuilder-snapshot API on osuosl4. []
    • Schedule the i386 architecture a bit more often. []
    • Adapt cleanup_nodes.sh to the new way of running our build services. []
    • Add 8 more workers for the i386 architecture. []
    • Update configuration now that the infom07 and infom08 nodes have been reinstalled as “real” i386 systems. []
    • Make diffoscope timeouts more visible on the #debian-reproducible-changes IRC channel. []
    • Mark the cbxi4a-armhf node as down. [][]
    • Only install the hdmi2usb-mode-switch package only on Debian bookworm and earlier [] and only install the haskell-platform package on Debian bullseye [].
  • Misc:

    • Install the ntpdate utility as we need it later. []
    • Document the progress on the i386 architecture nodes at Infomaniak. []
    • Drop an outdated and unnoticed notice. []
    • Add live_setup_schroot to the list of so-called “zombie” jobs. []

In addition, Mattia Rizzolo reinstalled the infom07 and infom08 nodes [] and Vagrant Cascadian marked the cbxi4a node as online [].



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports