Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian
mylvmbackup package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the
mylvmbackup package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.
Johannes Schauer discovered a fascinating bug where simply naming your Python variable
_m led to unreproducible
.pyc files. In particular, the
types module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py +++ b/Lib/types.py @@ -37,8 +37,8 @@ _ag = _ag() AsyncGeneratorType = type(_ag) class _C: - def _m(self): pass -MethodType = type(_C()._m) + def _b(self): pass +MethodType = type(_C()._b)
Simply renaming the dummy method from
_b was enough to workaround the problem. Johannes’ bug report first led to a number of improvements in diffoscope to aid in dissecting
.pyc files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.
New SPDX team to incorporate build metadata in Software Bill of Materials
SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The “SPDX Builds Team” has been working throughout May to define the universal primitives shared by all build systems, including the “who, what, where and how” of builds:
Who: the identity of the person or organisation that controls the build infrastructure.
What: the inputs and outputs of a given build, combining metadata about the build’s configuration with an SBOM describing source code and dependencies.
How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.
Talks at Debian Reunion Hamburg
Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds.
First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB):
Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):
Supply-chain security attacks
This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM
foreach package let their personal email domain expire, so they bought it and now “controls
foreach on NPM and the 36,826 projects that depend on it”. Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the “correct way to ship packages is with your distribution’s package manager”.
“Bootstrapping” is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.
Google’s new Assured Open Source Software service
Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable “enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows”. The announcement goes on to enumerate that packages curated by the new service would be:
Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
Are built with Cloud Build including evidence of verifiable SLSA-compliance
Are verifiably signed by Google.
Are distributed from an Artifact Registry secured and protected by Google.
A retrospective on the Rust programming language
Andrew “bunnie” Huang published a long blog post this month promising a “critical retrospective” on the Rust programming language. Amongst many acute observations about the evolution of the language’s syntax (etc.), the post beings to critique the languages’ approach to supply chain security (“Rust Has A Limited View of Supply Chain Security”) and reproducibility (“You Can’t Reproduce Someone Else’s Rust Build”):
There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
Reproducible Builds IRC meeting
The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on
#reproducible-builds on the OFTC network.
A new tool to improve supply-chain security in Arch Linux
kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many
PKGBUILDs provide authentication in the context of signed Git tags (i.e. the ability to “verify the Git tag was signed by one of the two trusted keys”), they do not support pinning, ie. that “upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing”. Conversely, other
PKGBUILDs support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd’s original blog post.
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions
214 to Debian unstable.
Chris also made the following changes:
- Substantially update comment for our calls to
zipinfo -v. […]
get_datawith a separate
- Don’t call
re.compileand then call
.subon the result; just call
- Clarify the comment around the difference between
- Substantially update comment for our calls to
In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new
nondeterministic_ordering_in_deprecated_items_collected_by_doxygen toolchain issue […] as well as ones for
extended_attributes_in_jar_file_created_without_manifest […] and
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Bernhard M. Wiedemann:
Johannes Schauer Marin Rodrigues:
- #1010462 filed against
- #1010463 filed against
- #1010466 filed against
- #1010483 filed against
- #1010486 filed against
- #1010781 filed against
- #1010785 filed against
- #1010787 filed against
- #1010789 filed against
- #1010790 filed against
- #1010825 filed against
- #1010828 filed against
- #1010830 filed against
- #1010859 filed against
- #1010870 filed against
- #1010871 filed against
- #1010872 filed against
- #1010944 filed against
- #1010948 filed against
- #1011034 filed against
- #1011036 filed against
- #1011104 filed against
- #1011109 filed against
- #1011257 filed against
- #1011402 filed against
- #1011405 filed against
- #1011428 filed against
- #1011429 filed against
- #1011469 filed against
- #1011470 filed against
- #1011471 filed against
- #1011478 filed against
- #1011479 filed against
- #1011480 filed against
- #1011481 filed against
- #1011486 filed against
- #1011488 filed against
- #1011489 filed against
- #1011490 filed against
- #1011491 filed against
- #1011493 filed against
- #1011495 filed against
- #1011496 filed against
- #1011498 filed against
- #1011499 filed against
- #1011500 filed against
- #1011501 filed against
- #1011503 filed against
- lcsync (remove build paths)
- #1010462 filed against
Reproducible builds website
Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. […][…][…][…]
The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
On our mailing list this month:
John Neffenger posted that the early-access release of OpenJDK version 19 build 21 is reproducible.
Mattia Rizzolo added a request around tentatively planning a Reproducible Builds summit in 2022.
Bernhard M. Wiedemann posted about a Reproducible Builds meetup at the openSUSE conference in Nuremberg.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via: