Welcome to the May 2024 report from the Reproducible Builds project! In these reports, we try to outline what we have been up to over the past month and highlight news items in software supply-chain security more broadly. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Table of contents:
- A peek into build provenance for Homebrew
- Distribution news
- Mailing list news
- Miscellaneous news
- Two new academic papers
- diffoscope
- Website updates
- Upstream patches
- Reproducibility testing framework
A peek into build provenance for Homebrew
Joe Sweeney and William Woodruff on the Trail of Bits blog wrote an extensive post about build provenance for Homebrew, the third-party package manager for MacOS. Their post details how each “bottle” (i.e. each release):
[…] built by Homebrew will come with a cryptographically verifiable statement binding the bottle’s content to the specific workflow and other build-time metadata that produced it. […] In effect, this injects greater transparency into the Homebrew build process, and diminishes the threat posed by a compromised or malicious insider by making it impossible to trick ordinary users into installing non-CI-built bottles.
The post also briefly touches on future work, including work on source provenance:
Homebrew’s formulae already hash-pin their source artifacts, but we can go a step further and additionally assert that source artifacts are produced by the repository (or other signing identity) that’s latent in their URL or otherwise embedded into the formula specification.
Distribution news
In Debian this month, Johannes Schauer Marin Rodrigues (aka josch) noticed that the Debian binary package bash
version 5.2.15-2+b3
was “uploaded to the archive twice. Once to bookworm and once to sid but with differing content.” This is problem for reproducible builds in Debian due its assumption that the package name, version and architecture triplet is unique. However, josch highlighted that
This example with
bash
is especially problematic sincebash
isEssential:yes
, so there will now be a large portion of.buildinfo
files where it is not possible to figure out with which of the two differing bash packages the sources were compiled.
In response to this, Holger Levsen performed an analysis of all .buildinfo
files and found that this needs almost 1,500 binNMUs to fix the fallout from this bug.
Elsewhere in Debian, Vagrant Cascadian posted about a Non-Maintainer Upload (NMU) sprint to take place during early June, and it was announced that there is now a #debian-snapshot
IRC channel on OFTC to discuss the creation of a new source code archiving service to, perhaps, replace snapshot.debian.org. Lastly, 11 reviews of Debian packages were added, 15 were updated and 48 were removed this month adding to our extensive knowledge about identified issues. A number of issue types have been updated by Chris Lamb as well. […][…]
Elsewhere in the world of distributions, deep within a larger announcement from Colin Percival about the release of version 14.1-BETA2, it was mentioned that the FreeBSD kernels are now built reproducibly.
In Fedora, however, the change proposal mentioned in our report for April 2024 was approved, so, per the ReproduciblePackageBuilds wiki page, the add-determinism tool is now running in new builds for Fedora 41 (‘rawhide’). The add-determinism tool is a Rust program which, as its name suggests, adds determinism to files that are given as input by “attempting to standardize metadata contained in binary or source files to ensure consistency and clamping to $SOURCE_DATE_EPOCH
in all instances”. This is essentially the Fedora version of Debian’s strip-nondeterminism. However, strip-nondeterminism is written in Perl, and Fedora did not want to pull Perl in the buildroot
for every package. The add-determinism tool eliminates many causes of non-determinism and work is ongoing to continue the scope of packages it can operate on.
Mailing list news
On our mailing list this month, regular contributor kpcyrd wrote to the list with an update on their source code indexing project, whatsrc.org. The whatsrc.org project, which was launched last month in response to the XZ Utils backdoor, now contains and indexes almost 250,000 unique source code archives. In their post, kpcyrd gives an example of its intended purpose, noting that it shown that whilst “there seems to be consensus about [the] source code for zsh 5.9” in various Linux distributions, it “does not align with the contents of the zsh Git repository”.
Holger Levsen also posted to the list with a ‘pre-announcement’ of sorts for the 2024 Reproducible Builds summit. In particular:
[Whilst] the dates and location are not fixed yet, however if you don’ help us with finding a suitable location soon, it is very likely that we’ll meet again in Hamburg in the 2nd half of September 2024 […].
Lastly, Frederic-Emmanuel Picca wrote to the list asking for help understanding the “non-reproducible status of the Debian silx
package” and received replies from both Vagrant Cascadian and Chris Lamb.
Miscellaneous news
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month strip-nondeterminism version 1.14.0-1
was uploaded to Debian unstable by Chris Lamb chiefly to incorporate a change from Alex Muntada to avoid a dependency on Sub::Override
to perform monkey-patching and break circular dependencies related to debhelper
[…]. Elsewhere in our tooling, Jelle van der Waa modified reprotest because the pipes
module will be removed in Python version 3.13 […].
It was also noticed that a new blog post by Daniel Stenberg detailing “How to verify a Curl release” mentions the SOURCE_DATE_EPOCH
environment variable. This is because:
The [curl] release tools document also contains another key component: the exact time stamp at which the release was done – using integer second resolution. In order to generate a correct tarball clone, you need to also generate the new version using the old version’s timestamp. Because the modification date of all files in the produced tarball will be set to this timestamp.
Furthermore, Fay Stegerman filed a bug against the Signal messenger app for Android to report that their ‘reproducible’ builds cannot, in fact, be reproduced. However, Fay is quick to note that she has:
… found zero evidence of any kind of compromise. Some differences are yet unexplained but everything I found seems to be benign. I am disappointed that Reproducible Builds have been broken for months but I have zero reason to doubt Signal’s security in any way.
Lastly, it was observed that there was a concise and diagrammatic overview of “supply chain threats” on the SLSA website.
Two new academic papers
Two new scholarly papers were published this month.
Firstly, Mathieu Acher, Benoît Combemale, Georges Aaron Randrianaina and Jean-Marc Jézéquel of University of Rennes on Embracing Deep Variability For Reproducibility & Replicability. The authors describe their approach as follows:
In this short [vision] paper we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (e.g., language, libraries, compiler, virtual machine, OS, environment variables, etc.). The primary objectives are: i) gaining insights into the variability layers and their possible interactions, ii) capturing and documenting configurations for the sake of reproducibility, and iii) exploring diverse configurations to replicate, and hence validate and ensure the robustness of results. By adopting these methodologies, we aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical aspects.
(A PDF of this article is available.)
Secondly, Ludovic Courtès, Timothy Sample, Simon Tournier and Stefano Zacchiroli have collaborated to publish a paper on Source Code Archiving to the Rescue of Reproducible Deployment. Their paper was motivated because:
The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.
(A PDF of this article is also available.)
diffoscope
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 266
, 267
, 268
and 269
to Debian, making the following changes:
-
New features:
- Use
xz --list
to supplement output when comparing .xz archives; essential when metadata differs. (#1069329) - Include
xz --verbose --verbose
(ie. double) output. (#1069329) - Strip the first line from the
xz --list
output. […] - Only include
xz --list --verbose
output if thexz
has no other differences. […] - Actually append the
xz --list
after the container differences, as it simplifies a lot. […]
- Use
-
Testing improvements:
- Allow Debian testing to fail right now. […]
- Drop
apktool
fromBuild-Depends
; we can still test APK functionality via autopkgtests. (#1071410) - Add a versioned dependency for at least version 5.4.5 for the
xz
tests as they fail under (at least) version 5.2.8. (#374) - Fix tests for
7zip
24.05. […][…] - Fix all tests after additon of
xz --list
. […][…]
-
Misc:
- Update copyright years. […]
In addition, James Addison fixed an issue where the HTML output showed only the first difference in a file, while the text output shows all differences […][…][…], Sergei Trofimovich amended the 7zip
version test for older 7z versions that include the string “[64]
“ […][…] and Vagrant Cascadian relaxed the versioned dependency to allow version 5.4.1 for the xz
tests […] and proposed updates to guix for versions 267, 268 and pushed version 269 to Guix. Furthermore, Eli Schwartz updated the diffoscope.org website in order to explain how to install diffoscope on Gentoo […].
Website updates
There were a number of improvements made to our website this month, including Chris Lamb making the “print” CSS stylesheet nicer […]. Fay Stegerman made a number of updates to the page about the SOURCE_DATE_EPOCH
environment variable […][…][…] and Holger Levsen added some of their presentations to the “Resources” page. Furthermore, IOhannes zmölnig stipulated support for SOURCE_DATE_EPOCH
in clang version 16.0.0+ […], Jan Zerebecki expanded the “Formal definition” page and fixed a number of typos on the “Buy-in” page […] and Simon Josefsson fixed the link to Trisquel GNU/Linux on the “Projects” page […].
Upstream patches
This month, we wrote a number of patches to fix specific reproducibility issues, including:
-
Bernhard M. Wiedemann:
-
Chris Lamb:
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:
-
Debian-related changes:
- Enable the rebuilder-snapshot API on
osuosl4
. […] - Schedule the
i386
architecture a bit more often. […] - Adapt
cleanup_nodes.sh
to the new way of running our build services. […] - Add 8 more workers for the
i386
architecture. […] - Update configuration now that the
infom07
andinfom08
nodes have been reinstalled as “real”i386
systems. […] - Make diffoscope timeouts more visible on the
#debian-reproducible-changes
IRC channel. […] - Mark the
cbxi4a-armhf
node as down. […][…] - Only install the
hdmi2usb-mode-switch
package only on Debian bookworm and earlier […] and only install thehaskell-platform
package on Debian bullseye […].
- Enable the rebuilder-snapshot API on
-
Misc:
- Install the
ntpdate
utility as we need it later. […] - Document the progress on the
i386
architecture nodes at Infomaniak. […] - Drop an outdated and unnoticed notice. […]
- Add
live_setup_schroot
to the list of so-called “zombie” jobs. […]
- Install the
In addition, Mattia Rizzolo reinstalled the infom07
and infom08
nodes […] and Vagrant Cascadian marked the cbxi4a
node as online […].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org