Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Repfix paper
Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian mylvmbackup
package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup
package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.
Python variables
Johannes Schauer discovered a fascinating bug where simply naming your Python variable _m
led to unreproducible .pyc
files. In particular, the types
module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
AsyncGeneratorType = type(_ag)
class _C:
- def _m(self): pass
-MethodType = type(_C()._m)
+ def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m
to _b
was enough to workaround the problem. Johannes’ bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc
files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.
New SPDX team to incorporate build metadata in Software Bill of Materials
SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The “SPDX Builds Team” has been working throughout May to define the universal primitives shared by all build systems, including the “who, what, where and how” of builds:
-
Who: the identity of the person or organisation that controls the build infrastructure.
-
What: the inputs and outputs of a given build, combining metadata about the build’s configuration with an SBOM describing source code and dependencies.
-
Where: the software packages making up the build system, from build orchestration tools such as Woodpecker CI and Tekton to language-specific tools.
-
How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.
Talks at Debian Reunion Hamburg
Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds.
First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB):
Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):
Supply-chain security attacks
This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM foreach
package let their personal email domain expire, so they bought it and now “controls foreach
on NPM and the 36,826 projects that depend on it”. Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the “correct way to ship packages is with your distribution’s package manager”.
Bootstrapping
“Bootstrapping” is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.
Google’s new Assured Open Source Software service
Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable “enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows”. The announcement goes on to enumerate that packages curated by the new service would be:
-
Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
-
Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
-
Are built with Cloud Build including evidence of verifiable SLSA-compliance
-
Are verifiably signed by Google.
-
Are distributed from an Artifact Registry secured and protected by Google.
A retrospective on the Rust programming language
Andrew “bunnie” Huang published a long blog post this month promising a “critical retrospective” on the Rust programming language. Amongst many acute observations about the evolution of the language’s syntax (etc.), the post beings to critique the languages’ approach to supply chain security (“Rust Has A Limited View of Supply Chain Security”) and reproducibility (“You Can’t Reproduce Someone Else’s Rust Build”):
There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
Reproducible Builds IRC meeting
The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on #reproducible-builds
on the OFTC network.
A new tool to improve supply-chain security in Arch Linux
kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many PKGBUILDs
provide authentication in the context of signed Git tags (i.e. the ability to “verify the Git tag was signed by one of the two trusted keys”), they do not support pinning, ie. that “upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing”. Conversely, other PKGBUILD
s support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd’s original blog post.
diffoscope
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 212
, 213
and 214
to Debian unstable.
Chris also made the following changes:
-
New features:
-
Bug fixes:
-
Codebase improvements:
- Substantially update comment for our calls to
zipinfo
andzipinfo -v
. […] - Use
assert_diff
intest_zip
over callingget_data
with a separateassert
. […] - Don’t call
re.compile
and then call.sub
on the result; just callre.sub
directly. […] - Clarify the comment around the difference between
--usage
and--help
. […]
- Substantially update comment for our calls to
-
Testsuite improvements:
Vagrant Cascadian added an external tool reference xb-tool
for GNU Guix […] as well as updated the diffoscope package in GNU Guix itself […][…][…].
Distribution work
In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new nondeterministic_ordering_in_deprecated_items_collected_by_doxygen
toolchain issue […] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering
[…], extended_attributes_in_jar_file_created_without_manifest
[…] and apxs_captures_build_path
[…].
Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
gtkmm-documentation
(merged; sorting issue)librespot
(merged; randomBuildID
issue)lirc
(merged)lsof
(uname
/hostname
problem)solanum
(merged, possibly a race condition)
-
Chris Lamb:
-
Johannes Schauer Marin Rodrigues:
-
Vagrant Cascadian:
- #1010462 filed against
mtink
. - #1010463 filed against
fceux
. - #1010466 filed against
glob2
. - #1010483 filed against
coinor-cgl
. - #1010486 filed against
metapixel
. - #1010781 filed against
ragel
. - #1010785 filed against
gdome2
. - #1010787 filed against
sgml-base-doc
. - #1010789 filed against
xarclock
. - #1010790 filed against
xgammon
. - #1010825 filed against
lwatch
. - #1010828 filed against
bbrun
. - #1010830 filed against
gscanbus
. - #1010859 filed against
libnss-gw-name
. - #1010870 filed against
pidgin-blinklight
. - #1010871 filed against
dvbtune
. - #1010872 filed against
efax
. - #1010944 filed against
quelcom
. - #1010948 filed against
xine-lib-1.2
. - #1011034 filed against
fusesmb
. - #1011036 filed against
mailfront
. - #1011104 filed against
convlit
. - #1011109 filed against
bitstormlite
. - #1011257 filed against
coinor-osi
. - #1011402 filed against
razor
. - #1011405 filed against
autoclass
. - #1011428 filed against
cdbackup
. - #1011429 filed against
dds2tar
. - #1011469 filed against
transcalc
. - #1011470 filed against
libapache2-mod-authz-unixgroup
. - #1011471 filed against
mgdiff
. - #1011478 filed against
scsitools
. - #1011479 filed against
fstrcmp
. - #1011480 filed against
libxsettings-client
. - #1011481 filed against
tamil-gtk2im
. - #1011486 filed against
tdfsb
. - #1011488 filed against
stymulator
. - #1011489 filed against
wiipdf
. - #1011490 filed against
gdigi
. - #1011491 filed against
getstream
. - #1011493 filed against
freecdb
. - #1011495 filed against
modglue
. - #1011496 filed against
nwall
. - #1011498 filed against
parprouted
. - #1011499 filed against
imagination
. - #1011500 filed against
tuxcmd-modules
. - #1011501 filed against
libapache2-mod-authn-yubikey
. - #1011503 filed against
libapache2-mod-auth-plain
. - lcsync (remove build paths)
- #1010462 filed against
Reproducible builds website
Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. […][…][…][…]
In addition, Tim Jones added a link to the Talos Linux project […] and billchenchina fixed a dead link […].
Testing framework
The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
-
Holger Levsen:
-
Mattia Rizzolo:
-
Roland Clobus:
And finally, as usual, node maintenance was also performed by Holger Levsen […][…].
Misc news
On our mailing list this month:
-
John Neffenger posted that the early-access release of OpenJDK version 19 build 21 is reproducible.
-
Mattia Rizzolo added a request around tentatively planning a Reproducible Builds summit in 2022.
-
Bernhard M. Wiedemann posted about a Reproducible Builds meetup at the openSUSE conference in Nuremberg.
-
Luca Boccassi asked for help getting
arm64
binaries to build reproducibly for the Debiansystemd
package. -
Yaobin Wen asked a number of questions in an attempt to discover the best practices for manage Debian
.dsc
files using reprepro.
Contact
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org