Reproducible Builds in March 2024

View all our monthly reports


Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Table of contents:

  1. Arch Linux minimal container userland now 100% reproducible
  2. Validating Debian’s build infrastructure after the XZ backdoor
  3. Making Fedora Linux (more) reproducible
  4. Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management
  5. Software and source code identification with GNU Guix and reproducible builds
  6. Two new Rust-based tools for post-processing determinism
  7. Distribution work
  8. Mailing list highlights
  9. Website updates
  10. Delta chat clients now reproducible
  11. diffoscope updates
  12. Upstream patches
  13. Reproducibility testing framework

Arch Linux minimal container userland now 100% reproducible

In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux “minimal container userland” is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a “real world”, widely-used Linux distribution being reproducible.

Their post, which kpcyrd suffixed with the question “now what?”, continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.


Validating Debian’s build infrastructure after the XZ backdoor

From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian’s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability.

Vagrant reports (with some caveats):

So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.

That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.


Making Fedora Linux (more) reproducible

In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible.

Documented in more detail on Fedora’s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what’s coming next. (YouTube video)


Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management

Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:

Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF

Julien’s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can “be leveraged to further improve the safety of the software supply chain”, etc.


Software and source code identification with GNU Guix and reproducible builds

In a long line of commendably detailed blog posts, Ludovic Courtès, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Courtès, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: “What does it take to ‘identify software’? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it?”

Later in the month, Ludovic Courtès wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic’s post touches on GNU Guix’s aim to support “time travel”, the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.


Two new Rust-based tools for post-processing determinism

Zbigniew Jędrzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project’s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora

In addition, Yossi Kreinin published a blog post titled “refix: fast, debuggable, reproducible builds that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.


Distribution work

In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of normal to a new level of wishlist. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories toolchain issue [] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource issue [] and updating the random_uuid_in_notebooks_generated_by_nbsphinx to reference a relevant discussion thread [].

In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t transition.

Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.


Mailing list highlights

Elsewhere on our mailing list this month:


Website updates

There were made a number of improvements to our website this month, including:

  • Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the CITATION.cff file. Pol also added an substantial new section to the “buy in” page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [][]

  • Bernhard M. Wiedemann added a new “commandments” page to the documentation [][] and fixed some incorrect YAML elsewhere on the site [].

  • Chris Lamb add three recent academic papers to the publications page of the website. []

  • Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of amd64 virtual machines. [][][]

  • Roland Clobus updated the “stable outputs” page, dropping version numbers from Python documentation pages [] and noting that Python’s set data structure is also affected by the PYTHONHASHSEED functionality. []


Delta chat clients now reproducible

Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.


diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 259, 260 and 261 to Debian and made the following additional changes:

  • New features:

    • Add support for the zipdetails tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. []
  • Bug fixes:

    • Don’t identify Redis database dumps as GNU R database files based simply on their filename. []
    • Add a missing call to File.recognizes so we actually perform the filename check for GNU R data files. []
    • Don’t crash if we encounter an .rdb file without an equivalent .rdx file. (#1066991)
    • Correctly check for 7z being available—and not lz4—when testing 7z. []
    • Prevent a traceback when comparing a contentful .pyc file with an empty one. []
  • Testsuite improvements:

    • Fix .epub tests after supporting the new zipdetails tool. []
    • Don’t use parenthesis within test “skipping…” messages, as PyTest adds its own parenthesis. []
    • Factor out Python version checking in test_zip.py. []
    • Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. []
    • Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). []

In addition, Fay Stegerman updated diffoscope’s monkey patch for supporting the unusual Mozilla ZIP file format after Python’s zipfile module changed to detect potentially insecure overlapping entries within .zip files. (#362)

Chris Lamb also updated the trydiffoscope command line client, dropping a build-dependency on the deprecated python3-distutils package to fix Debian bug #1065988 [], taking a moment to also refresh the packaging to the latest Debian standards []. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. []


Upstream patches

This month, we wrote a large number of patches, including:

Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their %check section, thus failing when built with the --no-checks option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy, exiv2, gnome-disk-utility, grisbi, gsl, itinerary, kosmindoormap, libQuotient, med-tools, plasma6-disks, pspp, python-pypuppetdb, python-urlextract, rsync, vagrant-libvirt and xsimd.

Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:

I don’t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it’s safe to say that the firmware should keep working.


Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.

In March, an enormous number of changes were made by Holger Levsen:

  • Debian-related changes:

    • Sleep less after a so-called “404” package state has occurred. []
    • Schedule package builds more often. [][]
    • Regenerate all our HTML indexes every hour, but only every 12h for the released suites. []
    • Create and update unstable and experimental base systems on armhf again. [][]
    • Don’t reschedule so many “depwait” packages due to the current size of the i386 architecture queue. []
    • Redefine our scheduling thresholds and amounts. []
    • Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. []
    • Only create the stats_buildinfo.png graph once per day. [][]
    • Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. []
    • Document how to use systemctl with new systemd-based services. []
    • Temporarily disable armhf and i386 continuous integration tests in order to get some stability back. []
    • Use the deb.debian.org CDN everywhere. []
    • Remove the rsyslog logging facility on bookworm systems. []
    • Add zst to the list of packages which are false-positive diskspace issues. []
    • Detect failures to bootstrap Debian base systems. []
  • Arch Linux-related changes:

    • Temporarily disable builds because the pacman package manager is broken. [][]
    • Split reproducible_html_live_status and split the scheduling timing . [][][]
    • Improve handling when database is locked. [][]
  • Misc changes:

    • Show failed services that require manual cleanup. [][]
    • Integrate two new Infomaniak nodes. [][][][]
    • Improve IRC notifications for artifacts. []
    • Run diffoscope in different systemd slices. []
    • Run the node health check more often, as it can now repair some issues. [][]
    • Also include the string Bot in the userAgent for Git. (Re: #929013). []
    • Document increased tmpfs size on our OSUOSL nodes. []
    • Disable memory account for the reproducible_build service. [][]
    • Allow 10 times as many open files for the Jenkins service. []
    • Set OOMPolicy=continue and OOMScoreAdjust=-1000 for both the Jenkins and the reproducible_build service. []

Mattia Rizzolo also made the following changes:

  • Debian-related changes:

    • Define a systemd slice to group all relevant services. [][]
    • Add a bunch of quotes in scripts to assuage the shellcheck tool. []
    • Add stats on how many packages have been built today so far. []
    • Instruct systemd-run to handle diffoscope’s exit codes specially. []
    • Prefer the pgrep tool over grepping the output of ps. []
    • Re-enable a couple of i386 and armhf architecture builders. [][]
    • Fix some stylistic issues flagged by the Python flake8 tool. []
    • Cease scheduling Debian unstable and experimental on the armhf architecture due to the time_t transition. []
    • Start a few more i386 & armhf workers. [][][]
    • Temporarly skip pbuilder updates in the unstable distribution, but only on the armhf architecture. []
  • Other changes:

    • Perform some large-scale refactoring on how the systemd service operates. [][]
    • Move the list of workers into a separate file so it’s accessible to a number of scripts. []
    • Refactor the powercycle_x86_nodes.py script to use the new IONOS API and its new Python bindings. []
    • Also fix nph-logwatch after the worker changes. []
    • Do not install the stunnel tool anymore, it shouldn’t be needed by anything anymore. []
    • Move temporary directories related to Arch Linux into a single directory for clarity. []
    • Update the arm64 architecture host keys. []
    • Use a common Postfix configuration. []

The following changes were also made by:

  • Jan-Benedict Glaw:

    • Initial work to clean up a messy NetBSD-related script. [][]
  • Roland Clobus:

    • Show the installer log if the installer fails to build. []
    • Avoid the minus character (i.e. -) in a variable in order to allow for tags in openQA. []
    • Update the schedule of Debian live image builds. []
  • Vagrant Cascadian:

    • Maintenance on the virt* nodes is completed so bring them back online. []
    • Use the fully qualified domain name in configuration. []

Node maintenance was also performed by Holger Levsen, Mattia Rizzolo [][] and Vagrant Cascadian [][][][]



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports