Welcome to the report from the Reproducible Builds project for February 2021. In our monthly reports, we try to outline the most important things that have happened in the world of reproducible builds. If you are interested in contributing to the project, though, please visit our Contribute page on our website.
Community news
On Sunday 7th February, Jan ‘janneke’ Nieuwenhuizen gave a talk at FOSDEM ‘21 on GNU Mes: Reproducibility is not enough: The missing link between stage0/M2-Planet and Mes. Taking place in the Declarative and Minimalistic Computing devroom, Jan’s talk touched on reproducible builds and how a minimal binary seed further reduces the security attack surface when creating (or “bootstrapping”) a system from scratch.
A few days earlier, Eric Brewer, Rob Pike, Abhishek Arya, Anne Bertucio and Kim Lewandowski wrote a post on the Google Security Blog proposing an industry-wide framework they call “Know, Prevent, Fix” which aims to improve how the industry might think about vulnerabilities in open source software, including “Consensus on metadata and identity standards” and — more relevant to the Reproducible Builds project — “Increased transparency and review for critical software”:
Ken Thompson’s Turing Award lecture famously demonstrated in 1984 that authentic source code alone is not enough, and recent events have shown this attack is a real threat. How do you trust your build system? All the components of it must be trusted and verified through a continuous process of building trust. Reproducible builds help—there is a deterministic outcome for the build and we can thus verify that we got it right—but are harder to achieve due to ephemeral data (such as timestamps) ending up in the release artifact. And safe reproducible builds require verification tools, which in turn must be built verifiably and reproducibly, and so on. We must construct a network of trusted tools and build products. […]
After that, Drew DeVault wrote an interesting blog post titled How to make your downstream users happy, pointing out that “There are a number of things that your FOSS project can be doing which will make the lives of your downstream users easier, particularly if you’re writing a library or programmer-facing tooling”. We concur, especially with Drew’s recommendations to use the Reproducible Builds’ SOURCE_DATE_EPOCH
environment variable.
Another blog post this month was written by Alex Birsan where he details a novel supply-chain attack, similar to (but also distinct from) the various typo-squatting attacks that have been increasingly popular in the past year or so. Alex’s post begins with the ominous phrase: “Ever since I started learning how to code, I have been fascinated by the level of trust we put [in] pip install package_name
”.
Closer to home, Justin Cappos replied to an email on our mailing list answering the question How we could accelerate deployment of verified reproducible builds?, describing some of the workings of in-toto
with regards to the potentially distributed validation of binary signatures. […]
Software development
diffoscope
diffoscope is the Reproducible Build’s project in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it provides human-readable diffs from many kinds of binary format. This month, Chris Lamb made a large number of changes (including releasing version 167 and version 168):
-
Bug fixes:
- Don’t call
difflib.Differ.compare
with very large inputs; it is at least O(n^2) and makes diffoscope (appear to) hang. […] - Don’t rely on
dumpimage
returning an appropriate exit code; check that the file actually exists. […] - Don’t rely on
magic.Magic
to have an identical API between file’smagic.py
and PyPI’spython-magic
library. […]
- Don’t call
-
Revamp temporary file handling:
-
Testsuite improvements:
- Strip newlines when determining the Black source code formatter version to avoid
requires black >= 20.8b1 (18.9b0\n detected)
in test output. […] - Fix
weakref
-related handling in Python 3.7 (i.e. Debian buster). […] - If our temporary directory does not exist anymore, recreate it. […]
- Fix FIT-related tests in Debian buster […] and
fit_expected_diff
[…]. - Gnumeric is back in testing so re-add to (test)
Build-Depends
. […] - Mark
test_apk.py::test_android_manifest
as being allowed to fail for now. […] - Add
u-boot-tools
to (test) Build-Depends so salsa.debian.org pipelines test the new U-Boot FIT comparator. […] - Move to
assert_diff
utility in a number of tests. […][…]
- Strip newlines when determining the Black source code formatter version to avoid
-
Codebase improvements:
- Correct capitalisation of ‘jQuery’. […]
- Update various copyright years. […]
- Tidy imports in
diffoscope.comparators.fit
. […] - Don’t use
Inheriting PATH of X
, usePATH is X
in logging messages. […] - Drop unused
Config.acl
andConfig.xattr
attributes […] and set a defaultConfig.extended_filesystem_attributes
. […]
Vagrant Cascadian updated diffoscope in GNU Guix to versions 165 […], 166, […] and 167 […].
Mattia Rizzolo updated diffoscope in Debian buster-backports to version 166~bpo10+1.
Debian
Roland Clobus created a page on the Debian Wiki to detail his progress in creating reproducible “live” images (i.e. bootable USB sticks, etc.). In Roland’s post to our mailing list, Roland included a short summary that included:
The ‘standard’ image is reproducible, if
fontconfig
andmdadm
are patched. Forfontconfig
I’ve created a patch that works for live-build, but not for all other tool that might who need it. Formdadm
I’m finalizing a patch.
Elsewhere, The apt-transport-in-toto
package (an add-on for APT to use in-toto supply-chain verifications), is now available in the bullseye distribution for the first time and will, therefore, be included in the next stable release of Debian.
Holger Levsen suggested the creation of a partial mirror of snapshot.debian.org (a service needed to rebuild Debian packages) to work around problems with the widespread adoption of the snapshot.debian.org site […]. In addition, a new metasnap.debian.net
service was announced in a recent edition of Misc Developer News. This new offering is designed to complement the existing snapshot.debian.org service to answer questions such as:
- Given a certain timestamp, which version of a certain package was in a given suite at that time?
- Given a versioned package, in which suite was that package present during which periods of time?
- Given a package and a suite name, which versions where present in that suite during which times?
45 reviews of Debian packages were added, 39 were updated and 28 were removed this month adding to our knowledge about identified issues. Two issue types were added by Chris Lamb: build_path_in_documentation_generated_by_pdflatex
and build_path_in_record_file_generated_by_pybuild_flit_plugin
.
Other distributions
The Yocto Project has continued working on improving reproducibility. They now have a live webpage which shows reproducibility statistics directly from their CI system and have added this to the Reproducible Builds Continuous tests page. When the CI system detects differences in the output, it automatically generates diffoscope reports and shares these in order to help developers understand the cause of issues and help fix them.
As well as the previously reported .deb
and .ipk
output, .rpm
output is also now being tested in Yocto as well, and for OpenEmbedded-Core
, 34,335 out of 34,392 packages are now reproducible. The differences are limited to code using the Go programming language (which isn’t reproducible at present), perf
and three other packages which are exhibiting minor issues.
Bernhard M. Wiedemann posted his monthly reproducible builds status report for the openSUSE distribution which had a number of followups on the topic of unique identifiers in PDF files and SOURCE_DATE_EPOCH
. Bernhard also packaged dettrace (covered in a previous month’s report) for openSUSE too […].
Marek Marczykowski-Górecki wrote a lengthy blog post about the development process of Qubes-OS titled “Improvements in testing and building: GitLab CI and reproducible builds”. Marek describes the problem solved by reproducible builds as follows:
[Imagine] that an attacker wishes to feed unsuspecting users a compromised package. The attacker knows that the source code is public, so any malicious code he inserts into it would be highly exposed and at risk of detection. On the other hand, he reasons, compromising the build infrastructure would allow him to surreptitiously insert malicious changes that would make it into the resultant package. Since the source code remains untouched, his malicious changes are less likely to be detected. This is where the value of reproducible builds comes in. If the build process is reproducible, then we will immediately notice that building a package from the untouched source code results in a package that is different from the compromised one. This would be a major red flag that would prompt an immediate security investigation. […]
In Fedora, Frédéric Pierret restarted a discussion regarding .buildinfo
files for RPM, and made disorderfs and reprotest available in the official Fedora repos.
In NixOS, Tom Berek made the date in the asciidoc manpages deterministic and Arnout Engelen made sure that squashfs images are reproducible, regardless of the presence of hard links. For the milestone of a fully-reproducible minimal installation ISO include open PRs for gcc
and python
.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
automake/pcre
(filesystem and ASLR-related issue))calc
(required fixes for non-Intel CPUs)cpan
(date-related issue)gmic
(address a copyright year)gocr
(randomisation issue)HTTP
(fix build failures after June 2021)ibus
(parallelism issue)ipxe
(date issue, random issue)ipxe
(modification time issue)jpype
(sort a Python-based filesystem ordering)lagrange
(CPU-related issue)libsoup
(fix build failing in 2027)openscap
(modification time issue)scap-security-guide
:syslinux/isohybrid
(fix a nondeterministic MBR ID foripxe.iso
)
-
Chris Lamb:
- #981570 filed against
crossfire
. - #981571 filed against
zmk
. - #982529 filed against
python-aiosqlite
. - #982851 filed against
mocassin
(forwarded upstream). - #983033 filed against
golang-github-revel-revel
. - #983046 filed against
kjs
. - #983163 filed against
golang-github-viant-toolbox
.
- #981570 filed against
-
Vagrant Cascadian:
- #983126 filed against
iptotal
. - #983138 filed against
ypserv
. - #983142 filed against
circlator
. - #983147 filed against
armagetronad
. - #983148 filed against
wxmaxima
. - #983202 filed against
time
. - #983208 & #983209 filed against
lynx
. - #983302 & #983303 filed against
imagemagick
. - #983584 filed against
paraview
. - #983588 filed against
xmlgraphics-commons
.
- #983126 filed against
Testing framework
The Reproducible Builds project operates a Jenkins-based testing framework that powers tests.reproducible-builds.org
. This month, the following changes were made:
-
Frédéric Pierret (Qubes-OS):
-
Holger Levsen:
- Switch the
ionos7
host to Debian bullseye […] and update the PostgreSQL-related packages for a.buildinfo
-related service hosted on Debian bullseye too […]. - Improve the
deploy_jdn
script, adding support for short options […][…], conditional deployment […] and some general code improvements […][…]. - Fix failed networking and “
pbuilder_create
scope” issues in the node health check system. […] -
Move more IRC notifications to the
#reproducible-changes
channel […] and be verbose about sleeping time. […] -
Package rebuilder prototype:
- Drop a reference and workaround to Debian bug related to signed
.buildinfo
files (#955050) as it has been fixed upstream. […][…] - Remove a workaround that was previously needed for the version of
sbuild
in Debian buster. […] - Use
debrebuild --builder=sbuild
to better mimic the behaviour of the official Debian build servers. […] - Make some miscellaneous code improvements. […][…]
- Drop a reference and workaround to Debian bug related to signed
- Switch the
Lastly, build node maintenance was performed by Holger Levsen […][…][…][…], Mattia Rizzolo […][…][…][…] and Vagrant Cascadian […][…].
Other development news
On our website this month, Holger Levsen added a public reproducible-builds-developers-keys.asc
file which contains the GPG keys used by some Reproducible Builds developers […] and Joshua Watt added a link to Yocto Project’s reproducible builds summary. […]
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb uploaded version 1.11.0-1
to Debian unstable, notably to include a contribution from Helmut Grohne in order to normalise PO-Revision-Date
fields (in addition to POT-Creation-Date
) in GNU gettext translation ifiles (#981895).
In a thread on our mailing list which was started to discuss potential ideas for Outreachy, Chris Lamb mentioned that he had been working on a proof-of-concept for a tool to automatically classify issues from the output of diffoscope and has added it to the reproducible-notes.git
repository. […]
reprotest is the Reproducible Build’s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, Frédéric Pierret made a number of changes to its RPM spec file […][…] and improved the testsuite in a handful of ways […][…]. Vagrant Cascadian then updated the version in GNU Guix. […]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter (@ReproBuilds) & Mastodon (@reproducible_builds@fosstodon.org)
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org