Reproducible Builds in February 2021

View all our monthly reports


Welcome to the report from the Reproducible Builds project for February 2021. In our monthly reports, we try to outline the most important things that have happened in the world of reproducible builds. If you are interested in contributing to the project, though, please visit our Contribute page on our website.

Community news

On Sunday 7th February, Jan ‘janneke’ Nieuwenhuizen gave a talk at FOSDEM ‘21 on GNU Mes: Reproducibility is not enough: The missing link between stage0/M2-Planet and Mes. Taking place in the Declarative and Minimalistic Computing devroom, Jan’s talk touched on reproducible builds and how a minimal binary seed further reduces the security attack surface when creating (or “bootstrapping”) a system from scratch.


A few days earlier, Eric Brewer, Rob Pike, Abhishek Arya, Anne Bertucio and Kim Lewandowski wrote a post on the Google Security Blog proposing an industry-wide framework they call “Know, Prevent, Fix” which aims to improve how the industry might think about vulnerabilities in open source software, including “Consensus on metadata and identity standards” and — more relevant to the Reproducible Builds project — “Increased transparency and review for critical software”:

Ken Thompson’s Turing Award lecture famously demonstrated in 1984 that authentic source code alone is not enough, and recent events have shown this attack is a real threat. How do you trust your build system? All the components of it must be trusted and verified through a continuous process of building trust. Reproducible builds help—there is a deterministic outcome for the build and we can thus verify that we got it right—but are harder to achieve due to ephemeral data (such as timestamps) ending up in the release artifact. And safe reproducible builds require verification tools, which in turn must be built verifiably and reproducibly, and so on. We must construct a network of trusted tools and build products. []


After that, Drew DeVault wrote an interesting blog post titled How to make your downstream users happy, pointing out that “There are a number of things that your FOSS project can be doing which will make the lives of your downstream users easier, particularly if you’re writing a library or programmer-facing tooling”. We concur, especially with Drew’s recommendations to use the Reproducible Builds’ SOURCE_DATE_EPOCH environment variable.


Another blog post this month was written by Alex Birsan where he details a novel supply-chain attack, similar to (but also distinct from) the various typo-squatting attacks that have been increasingly popular in the past year or so. Alex’s post begins with the ominous phrase: “Ever since I started learning how to code, I have been fascinated by the level of trust we put [in] pip install package_name”.


Closer to home, Justin Cappos replied to an email on our mailing list answering the question How we could accelerate deployment of verified reproducible builds?, describing some of the workings of in-toto with regards to the potentially distributed validation of binary signatures. []


Software development

diffoscope

diffoscope is the Reproducible Build’s project in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it provides human-readable diffs from many kinds of binary format. This month, Chris Lamb made a large number of changes (including releasing version 167 and version 168):

  • Bug fixes:

    • Don’t call difflib.Differ.compare with very large inputs; it is at least O(n^2) and makes diffoscope (appear to) hang. []
    • Don’t rely on dumpimage returning an appropriate exit code; check that the file actually exists. []
    • Don’t rely on magic.Magic to have an identical API between file’s magic.py and PyPI’s python-magic library. []
  • Revamp temporary file handling:

    • Ensure we cleanup our temporary directory by avoiding confusion between the TemporaryDirectory instance and the underlying directory. (#981123)
    • Try and use a potentially-useful suffix to our temporary directory. []
  • Testsuite improvements:

    • Strip newlines when determining the Black source code formatter version to avoid requires black >= 20.8b1 (18.9b0\n detected) in test output. []
    • Fix weakref-related handling in Python 3.7 (i.e. Debian buster). []
    • If our temporary directory does not exist anymore, recreate it. []
    • Fix FIT-related tests in Debian buster [] and fit_expected_diff [].
    • Gnumeric is back in testing so re-add to (test) Build-Depends. []
    • Mark test_apk.py::test_android_manifest as being allowed to fail for now. []
    • Add u-boot-tools to (test) Build-Depends so salsa.debian.org pipelines test the new U-Boot FIT comparator. []
    • Move to assert_diff utility in a number of tests. [][]
  • Codebase improvements:

    • Correct capitalisation of ‘jQuery’. []
    • Update various copyright years. []
    • Tidy imports in diffoscope.comparators.fit. []
    • Don’t use Inheriting PATH of X, use PATH is X in logging messages. []
    • Drop unused Config.acl and Config.xattr attributes [] and set a default Config.extended_filesystem_attributes. []

Vagrant Cascadian updated diffoscope in GNU Guix to versions 165 [], 166, [] and 167 [].

Mattia Rizzolo updated diffoscope in Debian buster-backports to version 166~bpo10+1.

Debian

Roland Clobus created a page on the Debian Wiki to detail his progress in creating reproducible “live” images (i.e. bootable USB sticks, etc.). In Roland’s post to our mailing list, Roland included a short summary that included:

The ‘standard’ image is reproducible, if fontconfig and mdadm are patched. For fontconfig I’ve created a patch that works for live-build, but not for all other tool that might who need it. For mdadm I’m finalizing a patch.

Elsewhere, The apt-transport-in-toto package (an add-on for APT to use in-toto supply-chain verifications), is now available in the bullseye distribution for the first time and will, therefore, be included in the next stable release of Debian.

Holger Levsen suggested the creation of a partial mirror of snapshot.debian.org (a service needed to rebuild Debian packages) to work around problems with the widespread adoption of the snapshot.debian.org site []. In addition, a new metasnap.debian.net service was announced in a recent edition of Misc Developer News. This new offering is designed to complement the existing snapshot.debian.org service to answer questions such as:

  • Given a certain timestamp, which version of a certain package was in a given suite at that time?
  • Given a versioned package, in which suite was that package present during which periods of time?
  • Given a package and a suite name, which versions where present in that suite during which times?

45 reviews of Debian packages were added, 39 were updated and 28 were removed this month adding to our knowledge about identified issues. Two issue types were added by Chris Lamb: build_path_in_documentation_generated_by_pdflatex and build_path_in_record_file_generated_by_pybuild_flit_plugin.

Other distributions

The Yocto Project has continued working on improving reproducibility. They now have a live webpage which shows reproducibility statistics directly from their CI system and have added this to the Reproducible Builds Continuous tests page. When the CI system detects differences in the output, it automatically generates diffoscope reports and shares these in order to help developers understand the cause of issues and help fix them.

As well as the previously reported .deb and .ipk output, .rpm output is also now being tested in Yocto as well, and for OpenEmbedded-Core, 34,335 out of 34,392 packages are now reproducible. The differences are limited to code using the Go programming language (which isn’t reproducible at present), perf and three other packages which are exhibiting minor issues.

Bernhard M. Wiedemann posted his monthly reproducible builds status report for the openSUSE distribution which had a number of followups on the topic of unique identifiers in PDF files and SOURCE_DATE_EPOCH. Bernhard also packaged dettrace (covered in a previous month’s report) for openSUSE too [].

Marek Marczykowski-Górecki wrote a lengthy blog post about the development process of Qubes-OS titled “Improvements in testing and building: GitLab CI and reproducible builds”. Marek describes the problem solved by reproducible builds as follows:

[Imagine] that an attacker wishes to feed unsuspecting users a compromised package. The attacker knows that the source code is public, so any malicious code he inserts into it would be highly exposed and at risk of detection. On the other hand, he reasons, compromising the build infrastructure would allow him to surreptitiously insert malicious changes that would make it into the resultant package. Since the source code remains untouched, his malicious changes are less likely to be detected. This is where the value of reproducible builds comes in. If the build process is reproducible, then we will immediately notice that building a package from the untouched source code results in a package that is different from the compromised one. This would be a major red flag that would prompt an immediate security investigation. []

In Fedora, Frédéric Pierret restarted a discussion regarding .buildinfo files for RPM, and made disorderfs and reprotest available in the official Fedora repos.

In NixOS, Tom Berek made the date in the asciidoc manpages deterministic and Arnout Engelen made sure that squashfs images are reproducible, regardless of the presence of hard links. For the milestone of a fully-reproducible minimal installation ISO include open PRs for gcc and python.

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework

The Reproducible Builds project operates a Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Frédéric Pierret (Qubes-OS):

    • Add a new buildinfos_suites job. []
    • Adjust the ARCHES and per-suite list for our new job. [][]
  • Holger Levsen:

    • Switch the ionos7 host to Debian bullseye [] and update the PostgreSQL-related packages for a .buildinfo-related service hosted on Debian bullseye too [].
    • Improve the deploy_jdn script, adding support for short options [][], conditional deployment [] and some general code improvements [][].
    • Fix failed networking and “pbuilder_create scope” issues in the node health check system. []
    • Move more IRC notifications to the #reproducible-changes channel [] and be verbose about sleeping time. []

    • Package rebuilder prototype:

      • Drop a reference and workaround to Debian bug related to signed .buildinfo files (#955050) as it has been fixed upstream. [][]
      • Remove a workaround that was previously needed for the version of sbuild in Debian buster. []
      • Use debrebuild --builder=sbuild to better mimic the behaviour of the official Debian build servers. []
      • Make some miscellaneous code improvements. [][]

Lastly, build node maintenance was performed by Holger Levsen [][][][], Mattia Rizzolo [][][][] and Vagrant Cascadian [][].

Other development news

On our website this month, Holger Levsen added a public reproducible-builds-developers-keys.asc file which contains the GPG keys used by some Reproducible Builds developers [] and Joshua Watt added a link to Yocto Project’s reproducible builds summary. []

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb uploaded version 1.11.0-1 to Debian unstable, notably to include a contribution from Helmut Grohne in order to normalise PO-Revision-Date fields (in addition to POT-Creation-Date) in GNU gettext translation ifiles (#981895).

In a thread on our mailing list which was started to discuss potential ideas for Outreachy, Chris Lamb mentioned that he had been working on a proof-of-concept for a tool to automatically classify issues from the output of diffoscope and has added it to the reproducible-notes.git repository. []

reprotest is the Reproducible Build’s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, Frédéric Pierret made a number of changes to its RPM spec file [][] and improved the testsuite in a handful of ways [][]. Vagrant Cascadian then updated the version in GNU Guix. []


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports