Reproducible Builds in September 2024

View all our monthly reports


Welcome to the September 2024 report from the Reproducible Builds project!

Our reports attempt to outline what we’ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Table of contents:

  1. New binsider tool to analyse ELF binaries
  2. Unreproducibility of GHC Haskell compiler “95% fixed”
  3. Mailing list summary
  4. Towards a 100% bit-for-bit reproducible OS…
  5. Two new reproducibility-related academic papers
  6. Distribution work
  7. diffoscope
  8. Other software development
  9. Android toolchain core count issue reported
  10. New Gradle plugin for reproducibility
  11. Website updates
  12. Upstream patches
  13. Reproducibility testing framework

New binsider tool to analyse ELF binaries

Reproducible Builds developer Orhun Parmaksız has announced a fantastic new tool to analyse the contents of ELF binaries. According to the project’s README page:

Binsider can perform static and dynamic analysis, inspect strings, examine linked libraries, and perform hexdumps, all within a user-friendly terminal user interface!

More information about Binsider’s features and how it works can be found within Binsider’s documentation pages.


Unreproducibility of GHC Haskell compiler “95% fixed”

A seven-year-old bug about the nondeterminism of object code generated by the Glasgow Haskell Compiler (GHC) received a recent update, consisting of Rodrigo Mesquita noting that the issue is:

95% fixed by [merge request] !12680 when -fobject-determinism is enabled. []

The linked merge request has since been merged, and Rodrigo goes on to say that:

After that patch is merged, there are some rarer bugs in both interface file determinism (eg. #25170) and in object determinism (eg. #25269) that need to be taken care of, but the great majority of the work needed to get there should have been merged already. When merged, I think we should close this one in favour of the more specific determinism issues like the two linked above.


Mailing list summary

On our mailing list this month:

  • Fay Stegerman let everyone know that she started a thread on the Fediverse about the problems caused by unreproducible zlib/deflate compression in .zip and .apk files and later followed up with the results of her subsequent investigation.

  • Long-time developer kpcyrd wrote that “there has been a recent public discussion on the Arch Linux GitLab [instance] about the challenges and possible opportunities for making the Linux kernel package reproducible”, all relating to the CONFIG_MODULE_SIG flag. []

  • Bernhard M. Wiedemann followed-up to an in-person conversation at our recent Hamburg 2024 summit on the potential presence for Reproducible Builds in recognised standards. []

  • Fay Stegerman also wrote about her worry about the “possible repercussions for RB tooling of Debian migrating from zlib to zlib-ng” as reproducibility requires identical compressed data streams. []

  • Martin Monperrus wrote the list announcing the latest release of maven-lockfile that is designed aid “building Maven projects with integrity”. []

  • Lastly, Bernhard M. Wiedemann wrote about potential role of reproducible builds in combatting silent data corruption, as detailed in a recent Tweet and scholarly paper on faulty CPU cores. []


Towards a 100% bit-for-bit reproducible OS…

Bernhard M. Wiedemann began writing on journey towards a 100% bit-for-bit reproducible operating system on the openSUSE wiki:

This is a report of Part 1 of my journey: building 100% bit-reproducible packages for every package that makes up [openSUSE’s] minimalVM image. This target was chosen as the smallest useful result/artifact. The larger package-sets get, the more disk-space and build-power is required to build/verify all of them.

This work was sponsored by NLnet’s NGI Zero fund.


Marvin Strangfeld published his bachelor thesis, “Reproducibility of Computational Environments for Software Development” from RWTH Aachen University. The author offers a more precise theoretical definition of computational environments compared to previous definitions, which can be applied to describe real-world computational environments. Additionally, Marvin provide a definition of reproducibility in computational environments, enabling discussions about the extent to which an environment can be made reproducible. The thesis is available to browse or download in PDF format.

In addition, Shenyu Zheng, Bram Adams and Ahmed E. Hassan of Queen’s University, ON, Canada have published an article on “hermeticity” in Bazel-based build systems:

A hermetic build system manages its own build dependencies, isolated from the host file system, thereby securing the build process. Although, in recent years, new artifact-based build technologies like Bazel offer build hermeticity as a core functionality, no empirical study has evaluated how effectively these new build technologies achieve build hermeticity. This paper studies 2,439 non-hermetic build dependency packages of 70 Bazel-using open-source projects by analyzing 150 million Linux system file calls collected in their build processes. We found that none of the studied projects has a completely hermetic build process, largely due to the use of non-hermetic top-level toolchains. []


Distribution work

In Debian this month, 14 reviews of Debian packages were added, 12 were updated and 20 were removed, all adding to our knowledge about identified issues. A number of issue types were updated as well. [][]

In addition, Holger opened 4 bugs against the debrebuild component of the devscripts suite of tools. In particular:

  • #1081047: Fails to download .dsc file.
  • #1081048: Does not work with a proxy.
  • #1081050: Fails to create a debrebuild.tar.
  • #1081839: Fails with E: mmdebstrap failed to run error.

Last month, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest’s build_path variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org. This month, this issue was closed by Santiago R. R., nicely explaining that build path variation is no longer the default, and, if desired, how developers may enable it again.

In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.


diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading version 278 to Debian:

  • New features:

    • Add a helpful contextual message to the output if comparing Debian .orig tarballs within .dsc files without the ability to “fuzzy-match” away the leading directory.  []
  • Bug fixes:

    • Drop removal of calculated os.path.basename from GNU readelf output. []
    • Correctly invert “X% similar” value and do not emit “100% similar”. []
  • Misc:

    • Temporarily remove procyon-decompiler from Build-Depends as it was removed from testing (via #1057532). (#1082636)
    • Update copyright years. []

For trydiffoscope, the command-line client for the web-based version of diffoscope, Chris Lamb also:

  • Added an explicit python3-setuptools dependency. (#1080825)
  • Bumped the Standards-Version to 4.7.0. []


Other software development

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, version 0.5.11-4 was uploaded to Debian unstable by Holger Levsen making the following changes:

  • Replace build-dependency on the obsolete pkg-config package with one on pkgconf, following a Lintian check. []
  • Bump Standards-Version field to 4.7.0, with no related changes needed. []


In addition, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, version 0.7.28 was uploaded to Debian unstable by Holger Levsen including a change by Jelle van der Waa to move away from the pipes Python module to shlex, as the former will be removed in Python version 3.13 [].


Android toolchain core count issue reported

Fay Stegerman reported an issue with the Android toolchain where a part of the build system generates a different classes.dex file (and thus a different .apk) depending on the number of cores available during the build, thereby breaking Reproducible Builds:

We’ve rebuilt [tag v3.6.1] multiple times (each time in a fresh container): with 2, 4, 6, 8, and 16 cores available, respectively:

  • With 2 and 4 cores we always get an unsigned APK with SHA-256 14763d682c9286ef….
  • With 6, 8, and 16 cores we get an unsigned APK with SHA-256 35324ba4c492760… instead.


New Gradle plugin for reproducibility

A new plugin for the Gradle build tool for Java has been released. This easily-enabled plugin results in:

reproducibility settings [being] applied to some of Gradle’s built-in tasks that should really be the default. Compatible with Java 8 and Gradle 8.3 or later.


Website updates

There were a rather substantial number of improvements made to our website this month, including:


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:


Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In September, a number of changes were made by Holger Levsen, including:

  • Debian-related changes:

    • Upgrade the osuosl4 node to Debian trixie in anticipation of running debrebuild and rebuilderd there. [][][]
    • Temporarily mark the osuosl4 node as offline due to ongoing xfs_repair filesystem maintenance. [][]
    • Do not warn about (very old) broken nodes. []
    • Add the risc64 architecture to the multiarch version skew tests for Debian trixie and sid. [][][]
    • Mark the virt{32,64}b nodes as down. []
  • Misc changes:

    • Add support for powercycling OpenStack instances. []
    • Update the fail2ban to ban hosts for 4 weeks in total [][] and take care to never ban our own Jenkins instance. []

In addition, Vagrant Cascadian recorded a disk failure for the virt32b and virt64b nodes [], performed some maintenance of the cbxi4a node [][] and marked most armhf architecture systems as being back online.



Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports