Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Table of contents:
- Arch Linux minimal container userland now 100% reproducible
- Validating Debian’s build infrastructure after the XZ backdoor
- Making Fedora Linux (more) reproducible
- Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management
- Software and source code identification with GNU Guix and reproducible builds
- Two new Rust-based tools for post-processing determinism
- Distribution work
- Mailing list highlights
- Website updates
- Delta chat clients now reproducible
- diffoscope updates
- Upstream patches
- Reproducibility testing framework
Arch Linux minimal container userland now 100% reproducible
In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux “minimal container userland” is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a “real world”, widely-used Linux distribution being reproducible.
Their post, which kpcyrd suffixed with the question “now what?”, continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.
Validating Debian’s build infrastructure after the XZ backdoor
From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian’s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability.
Vagrant reports (with some caveats):
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.
That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.
Making Fedora Linux (more) reproducible
In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible.
Documented in more detail on Fedora’s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what’s coming next. (YouTube video)
“Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management”
Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF
Julien’s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can “be leveraged to further improve the safety of the software supply chain”, etc.
Software and source code identification with GNU Guix and reproducible builds
In a long line of commendably detailed blog posts, Ludovic Courtès, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Courtès, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: “What does it take to ‘identify software’? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it?”
Later in the month, Ludovic Courtès wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic’s post touches on GNU Guix’s aim to support “time travel”, the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.
Two new Rust-based tools for post-processing determinism
Zbigniew Jędrzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project’s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora
In addition, Yossi Kreinin published a blog post titled “refix: fast, debuggable, reproducible builds” that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.
Distribution work
In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of normal
to a new level of wishlist
. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories
toolchain issue […] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource
issue […] and updating the random_uuid_in_notebooks_generated_by_nbsphinx
to reference a relevant discussion thread […].
In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t
transition.
Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.
Mailing list highlights
Elsewhere on our mailing list this month:
-
Alexander Railean of Siemens asked the list to aid in understanding how one can independently verify the reproducibility of Java projects from the Maven Central repository. Having explored those repositories, Alexander could not find examples where the
buildinfo
file was present. Arnout Engelen responded with some details. -
Fay Stegerman resuscitated a long-dormant thread to report that she added support in her
diff-zip-meta.py
tool to expose extra timestamps embedded in.zip
and.apk
metadata.
Website updates
There were made a number of improvements to our website this month, including:
-
Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the
CITATION.cff
file. Pol also added an substantial new section to the “buy in” page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. […][…] -
Bernhard M. Wiedemann added a new “commandments” page to the documentation […][…] and fixed some incorrect YAML elsewhere on the site […].
-
Chris Lamb add three recent academic papers to the publications page of the website. […]
-
Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of
amd64
virtual machines. […][…][…] -
Roland Clobus updated the “stable outputs” page, dropping version numbers from Python documentation pages […] and noting that Python’s
set
data structure is also affected by thePYTHONHASHSEED
functionality. […]
Delta chat clients now reproducible
Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.
diffoscope
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 259
, 260
and 261
to Debian and made the following additional changes:
-
New features:
- Add support for the
zipdetails
tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. […]
- Add support for the
-
Bug fixes:
- Don’t identify Redis database dumps as GNU R database files based simply on their filename. […]
- Add a missing call to
File.recognizes
so we actually perform the filename check for GNU R data files. […] - Don’t crash if we encounter an
.rdb
file without an equivalent.rdx
file. (#1066991) - Correctly check for 7z being available—and not lz4—when testing 7z. […]
- Prevent a traceback when comparing a contentful
.pyc
file with an empty one. […]
-
Testsuite improvements:
- Fix
.epub
tests after supporting the newzipdetails
tool. […] - Don’t use parenthesis within test “skipping…” messages, as PyTest adds its own parenthesis. […]
- Factor out Python version checking in
test_zip.py
. […] - Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. […]
- Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). […]
- Fix
In addition, Fay Stegerman updated diffoscope’s monkey patch for supporting the unusual Mozilla ZIP file format after Python’s zipfile
module changed to detect potentially insecure overlapping entries within .zip
files. (#362)
Chris Lamb also updated the trydiffoscope
command line client, dropping a build-dependency on the deprecated python3-distutils
package to fix Debian bug #1065988 […], taking a moment to also refresh the packaging to the latest Debian standards […]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. […]
Upstream patches
This month, we wrote a large number of patches, including:
-
Bernhard M. Wiedemann:
helm
(SSL-related build failure)java-21-openjdk
(parallelism)libressl
(SSL-related build failure)nfdump
(date issue)python-django-q
(avoid stuck build)python-smart-open
(fails to build on single-CPU machines)python-stdnum
(fails to build in 2039)python-yarl
(regression)qemu
(build failure)rabbitmq-java-client
(with Fridrich Strba; Maven timestamp issue)rmw
(build fails in 2038)warewulf
(with Egbert Eich;cpio
modification time and inode issue)wxWidgets
(fails to build in 2038)
-
Chris Lamb:
- #1066042 filed against
python-quantities
. - #1066083 filed against
gnome-maps
. - #1066084 filed against
tox
. - #1066085 filed against
q2cli
. - #1067098 filed against
mpl-sphinx-theme
. - #1067099 filed against
woof-doom
. - #1067100 filed against
bochs
. - #1067101 filed against
storm-lang
. - #1067102 filed against
librsvg
. - #1067218 filed against
gretl
. - #1067483 filed against
postfix
. - #1067484 filed against
node-function-bind
. - #1067485 filed against
python-pysaml2
. - #1067947 filed against
golang-github-stvp-tempredis
.
- #1066042 filed against
-
James Addison:
- #1065124 filed against
matplotlib
. - #1066014 filed against
pathos
. - #1066016 filed against
rdflib
. - #1066017 filed against
xonsh
. - #1066045 filed against
maven-bundle-plugin
. (This patch was then uploaded by Mattia Rizzollo.)
- #1065124 filed against
-
Jiří Techet:
geany
(toolchain-related issue forglfw
)
Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their %check
section, thus failing when built with the --no-checks
option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy
, exiv2
, gnome-disk-utility
, grisbi
, gsl
, itinerary
, kosmindoormap
, libQuotient
, med-tools
, plasma6-disks
, pspp
, python-pypuppetdb
, python-urlextract
, rsync
, vagrant-libvirt
and xsimd
.
Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware
package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don’t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it’s safe to say that the firmware should keep working.
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
In March, an enormous number of changes were made by Holger Levsen:
-
Debian-related changes:
- Sleep less after a so-called “404” package state has occurred. […]
- Schedule package builds more often. […][…]
- Regenerate all our HTML indexes every hour, but only every 12h for the released suites. […]
- Create and update unstable and experimental base systems on
armhf
again. […][…] - Don’t reschedule so many “depwait” packages due to the current size of the
i386
architecture queue. […] - Redefine our scheduling thresholds and amounts. […]
- Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. […]
- Only create the
stats_buildinfo.png
graph once per day. […][…] - Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. […]
- Document how to use
systemctl
with new systemd-based services. […] - Temporarily disable
armhf
andi386
continuous integration tests in order to get some stability back. […] - Use the
deb.debian.org
CDN everywhere. […] - Remove the rsyslog logging facility on bookworm systems. […]
- Add
zst
to the list of packages which are false-positive diskspace issues. […] - Detect failures to bootstrap Debian base systems. […]
-
Arch Linux-related changes:
-
Misc changes:
- Show failed services that require manual cleanup. […][…]
- Integrate two new Infomaniak nodes. […][…][…][…]
- Improve IRC notifications for artifacts. […]
- Run diffoscope in different systemd slices. […]
- Run the node health check more often, as it can now repair some issues. […][…]
- Also include the string
Bot
in theuserAgent
for Git. (Re: #929013). […] - Document increased
tmpfs
size on our OSUOSL nodes. […] - Disable memory account for the
reproducible_build
service. […][…] - Allow 10 times as many open files for the Jenkins service. […]
- Set
OOMPolicy=continue
andOOMScoreAdjust=-1000
for both the Jenkins and thereproducible_build
service. […]
Mattia Rizzolo also made the following changes:
-
Debian-related changes:
- Define a
systemd
slice to group all relevant services. […][…] - Add a bunch of quotes in scripts to assuage the
shellcheck
tool. […] - Add stats on how many packages have been built today so far. […]
- Instruct
systemd-run
to handle diffoscope’s exit codes specially. […] - Prefer the
pgrep
tool over grepping the output ofps
. […] - Re-enable a couple of
i386
andarmhf
architecture builders. […][…] - Fix some stylistic issues flagged by the Python flake8 tool. […]
- Cease scheduling Debian unstable and experimental on the
armhf
architecture due to thetime_t
transition. […] - Start a few more
i386
&armhf
workers. […][…][…] - Temporarly skip
pbuilder
updates in the unstable distribution, but only on thearmhf
architecture. […]
- Define a
-
Other changes:
- Perform some large-scale refactoring on how the
systemd
service operates. […][…] - Move the list of workers into a separate file so it’s accessible to a number of scripts. […]
- Refactor the
powercycle_x86_nodes.py
script to use the new IONOS API and its new Python bindings. […] - Also fix nph-logwatch after the worker changes. […]
- Do not install the
stunnel
tool anymore, it shouldn’t be needed by anything anymore. […] - Move temporary directories related to Arch Linux into a single directory for clarity. […]
- Update the
arm64
architecture host keys. […] - Use a common Postfix configuration. […]
- Perform some large-scale refactoring on how the
The following changes were also made by:
-
Jan-Benedict Glaw:
-
Roland Clobus:
-
Vagrant Cascadian:
Node maintenance was also performed by Holger Levsen, Mattia Rizzolo […][…] and Vagrant Cascadian […][…][…][…]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org