Reproducible Builds in January 2025

View all our monthly reports


Welcome to the first report in 2025 report from the Reproducible Builds project!

Our monthly reports outline what we’ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As usual, though, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.


reproduce.debian.net

The last few months saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. Powering that is rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there.

This month, however, we are pleased to announce that in addition to the existing amd64.reproduce.debian.net and i386.reproduce.debian.net architecture-specific pages, we now build for a three more architectures (for a total of five) — arm64 armhf and riscv64.


Two new academic papers

Giacomo Benedetti, Oreofe Solarin, Courtney Miller, Greg Tystahl, William Enck, Christian Kästner, Alexandros Kapravelos, Alessio Merlo and Luca Verderame published an interesting article recently. Titled An Empirical Study on Reproducible Packaging in Open-Source Ecosystem, the abstract outlines its optimistic findings:

[We] identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.

The entire PDF is available online to view.


In addition, Julien Malka, Stefano Zacchiroli and Théo Zimmermann of Télécom Paris’ in-house research laboratory, the Information Processing and Communications Laboratory (LTCI) published an article asking the question: Does Functional Package Management Enable Reproducible Builds at Scale?.

Answering strongly in the affirmative, the article’s abstract reads as follows:

In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709,816 packages from historical snapshots of the nixpkgs repository[. We] obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full diffoscopes: recursive diffs of where unreproducible build artifacts differ.

As above, the entire PDF of the article is available to view online.


Distribution work

There as been the usual work in various distributions this month, such as:

  • 10+ reviews of Debian packages were added, 11 were updated and 10 were removed this month adding to our knowledge about identified issues. A number of issue types were updated also.

  • The FreeBSD Foundation announced that “a planned project to deliver zero-trust builds has begun in January 2025”. Supported by the Sovereign Tech Agency, this project is centered on the various build processes, and that the “primary goal of this work is to enable the entire release process to run without requiring root access, and that build artifacts build reproducibly – that is, that a third party can build bit-for-bit identical artifacts.” The full announcement can be found online, which includes an estimated schedule and other details.


On our mailing list…

On our mailing list this month:

  • Following-up to a substantial amount of previous work pertaining the Sphinx documentation generator, James Addison asked a question pertaining to the relationship between SOURCE_DATE_EPOCH environment variable and testing that generated a number of replies.

  • Adithya Balakumar of Toshiba asked a question about whether it is possible to make ext4 filesystem images reproducible. Adithya’s issue is that even the smallest amount of post-processing of the filesystem results in the modification of the “Last mount” and “Last write” timestamps.

  • James Addison also investigated an interesting issue surrounding our disorderfs filesystem. In particular:

    FUSE (Filesystem in USErspace) filesystems such as disorderfs do not delete files from the underlying filesystem when they are deleted from the overlay. This can cause seemingly straightforward tests — for example, cases that expect directory contents to be empty after deletion is requested for all files listed within them — to fail.


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:


diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 285, 286 and 287 to Debian:

  • Security fixes:

    • Validate the --css command-line argument to prevent a potential Cross-site scripting (XSS) attack. Thanks to Daniel Schmidt from SRLabs for the report. []
    • Prevent XML entity expansion attacks. Thanks to Florian Wilkens from SRLabs for the report.. [][]
    • Print a warning if we have disabled XML comparisons due to a potentially vulnerable version of pyexpat. []
  • Bug fixes:

    • Correctly identify changes to only the line-endings of files; don’t mark them as Ordering differences only. []
    • When passing files on the command line, don’t call specialize(…) before we’ve checked that the files are identical or not. []
    • Do not exit with a traceback if paths are inaccessible, either directly, via symbolic links or within a directory. []
    • Don’t cause a traceback if cbfstool extraction failed.. []
    • Use the surrogateescape mechanism to avoid a UnicodeDecodeError and crash when any decoding zipinfo output that is not UTF-8 compliant. []
  • Testsuite improvements:

    • Don’t mangle newlines when opening test fixtures; we want them untouched. []
    • Move to assert_diff in test_text.py. []
  • Misc improvements:

    • Drop unused subprocess imports. [][]
    • Drop an unused function in iso9600.py. []
    • Inline a call and check of Config().force_details; no need for an additional variable in this particular method. []
    • Remove an unnecessary return value from the Difference.check_for_ordering_differences method. []
    • Remove unused logging facility from a few comparators. []
    • Update copyright years. [][]

In addition, fridtjof added support for the ASAR .tar-like archive format. [][][][] and lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 285. [][] and 286[][].


strip-nondeterminism is our sister tool to remove specific non-deterministic results from a completed build. This month version 1.14.1-1 was uploaded to Debian unstable by Chris Lamb, making the following the changes:

  • Clarify the --verbose and non --verbose output of bin/strip-nondeterminism so we don’t imply we are normalizing files that we are not. []
  • Bump Standards-Version to 4.7.0. []


Website updates

There were a large number of improvements made to our website this month, including:


Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen, including:

  • reproduce.debian.net-related:

    • Add support for rebuilding the armhf architecture. [][]
    • Add support for rebuilding the arm64 architecture. [][][][]
    • Add support for rebuilding the riscv64 architecture. [][]
    • Move the i386 builder to the osuosl5 node. [][][][]
    • Don’t run our rebuilders on a public port. [][]
    • Add database backups on all builders and add links. [][]
    • Rework and dramatically improve the statistics collection and generation. [][][][][][]
    • Add contact info to the main page [], thumbnails [] as well as the new, missing architectures. []
    • Move the amd64 worker to the osuosl4 and node. []
    • Run the underlying debrebuild script under nice. []
    • Try to use TMPDIR when calling debrebuild. [][]
  • buildinfos.debian.net-related:

    • Stop creating buildinfo-pool_${suite}_${arch}.list files. []
    • Temporarily disable automatic updates of pool links. []
  • FreeBSD-related:

    • Fix the sudoers to actually permit builds. []
    • Disable debug output for FreeBSD rebuilding jobs. []
    • Upgrade to FreeBSD 14.2 [] and document that bmake was installed on the underlying FreeBSD virtual machine image [].
  • Misc:

    • Update the ‘real’ year to 2025. []
    • Don’t try to install a Debian bookworm kernel from ‘backports’ on the infom08 node which is running Debian trixie. []
    • Don’t warn about system updates for systems running Debian testing. []
    • Fix a typo in the ZOMBIES definition. [][]

In addition:

  • Ed Maste modified the FreeBSD build system to the clean the object directory before commencing a build. []

  • Gioele Barabucci updated the rebuilder stats to first add a category for network errors [] as well as to categorise failures without a diffoscope log [].

  • Jessica Clarke also made some FreeBSD-related changes, including:

    • Ensuring we clean up the object directory for second build as well. [][]
    • Updating the sudoers for the relevant rm -rf command. []
    • Update the cleanup_tmpdirs method to to match other removals. []
  • Jochen Sprickerhof:

  • Roland Clobus:

    • Update the reproducible_debstrap job to call Debian’s debootstrap with the full path [] and to use eatmydata as well [][].
    • Make some changes to deduce the CPU load in the debian_live_build job. []

Lastly, both Holger Levsen [] and Vagrant Cascadian [] performed some node maintenance.


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports