Reproducible Builds in October 2021

View all our monthly reports


Welcome to the October 2021 report from the Reproducible Builds project!


This month Samanta Navarro posted to the oss-security security mailing on a novel category of exploit in the .tar archive format, where a single .tar file contains different contents depending on the tar utility being used. Naturally, this has consequences for reproducible builds as Samanta goes onto reply:

Arch Linux uses libarchive (bsdtar) in its build environment. The default tar program installed is GNU tar. It is possible to create a source distribution which leads to different files seen by the build environment than compared to a careful reviewer and other Linux distributions.

Samanta notes that addressing the tar utilities themselves will not be a sufficient fix:

I have submitted bug reports and patches to some projects but eventually I had to conclude that the problem itself cannot be fixed by these implementations alone. The best choice for these tools would be to only allow archives which are fully compatible to standards but this in turn would render a lot of archives broken.

Reproducible builds, with its twin ideas of reaching consensus on the build outputs as well as precisely recording and describing the build environment, would help address this problem at a higher level.


Codethink announced that they had achieved ISO-26262 ASIL D Tool Certification, a way of determining specific safety standards for software. Codethink used open source tooling to achieve this, but they also leverage:

Reproducibility, repeatability and traceability of builds, drawing heavily on best-practices championed by the Reproducible Builds project.


Elsewhere on the internet, according to a comment on Hacker News, Microsoft are now comparing NPM Javascript packages with their original source repositories:

I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source.


Lastly, Martin Monperrus started an interesting thread on our mailing list about Github, specifically that their “autogenerated release tarballs are not deterministic”. The thread generated a significant number of replies that are worth reading.

Events and presentations

Community news

On our mailing list this month:


There were quite a few changes to the Reproducible Builds website and documentation this month as well, including Feng Chai updating some links on our ‘publications’ page [] and marco updated our project metadata around the Bitcoin Core building guide [].


Lastly, we ran another productive meeting on IRC during October. A full set of notes from the meeting is available to view.


Distribution work

Qubes was heavily featured in the latest edition of Linux Weekly News, and a significant section was dedicated to discussing reproducibility. For example, it was mentioned that the “Qubes project has been working on incorporating reproducible builds into its continuous integration (CI) infrastructure”. But the LWN article goes on to describe that:

The current goal is to be able to build the Qubes OS Debian templates solely from packages that can be built reproducibly. Templates in Qubes OS are VM images that can be used to start an application qube quickly based on the template. The qube will have read-only access to the root filesystem of the template, so that the same root filesystem can be shared with multiple application qubes. There are official templates for several variants of both Fedora and Debian, as well as community maintained templates for several other distributions.

You can view the whole article on LWN, and Frédéric also published a lengthy summary about their work on reproducible builds in Qubes as well for those wishing to learn more.


In Debian this month, 133 reviews of Debian packages were added, 81 were updated and 24 were removed this month, adding to Debian’s ever-growing knowledge about identified issues. A number of issues were categorised and added by Chris Lamb and Vagrant Cascadian too [][][]. In addition, work on alternative snapshot service has made progress by Frédéric Pierret and Holger Levsen this month, including moving from the existing host (snapshot.notset.fr) to snapshot.reproducible-builds.org (more info) — thanks to OSUOSL for the machine and hosting and Debian for the disks.


Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.


diffoscope

diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 186, 187, 188 and 189 to Debian

  • New features:

    • Add support for Python Sphinx inventory files (usually named objects.inv on-disk). []
    • Add support for comparing .pyc files. Thanks to Sergei Trofimovich for the inspiration. []
    • Try some alternative suffixes (e.g. .py) to support distributions that strip or retain them. [][]
  • Bug fixes:

    • Fix Python decompilation tests under Python 3.10+ [] and for Python 3.7 [].
    • Don’t raise a traceback if we cannot unmarshal Python bytecode. This is in order to support Python 3.7 failing to load .pyc files generated with newer versions of Python. []
    • Skip Python bytecode testing where we do not have an expected diff. []
  • Codebase improvements:

    • Use our file_version_is_lt utility instead of accepting both versions of uImage expected diff. []
    • Split out a custom call to assert_diff for a .startswith equivalent. []
    • Use skipif instead of manual conditionals in some tests. []

In addition, Jelle van der Waa added external tool references for Arch Linux for ocamlobjinfo, openssl and ffmpeg [][][] and added Arch Linux as a Continuous Integration (CI) test target. [] and Vagrant Cascadian updated the testsuite to skip Python bytecode comparisons when file(1) is older than 5.39. [] as well as added external tool references for the Guix distribution for dumppdf and ppudump. [][]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [][].

Lastly, Guangyuan Yang updated the FreeBSD package name on the website [], Mattia Rizzolo made a change to override a new Lintian warning due to the new test files [], Roland Clobus added support to detect and log if the GNU_BUILD_ID field in an ELF binary been modified [], Sandro Jäckel updated a number of helpful links on the website [] and Sergei Trofimovich made the uImage test output support file() version 5.41 [].


reprotest

reprotest is the Reproducible Build’s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences.

This month, reprotest version 0.7.18 was uploaded to Debian unstable by Holger Levsen, which also included a change by Holger to clarify that Python 3.9 is used nowadays [], but it also included two changes by Vasyl Gello to implement “realistic” CPU architecture shuffling [] and to log the selected variations when the verbosity is configured at a sufficiently high level []. Finally, Vagrant Cascadian updated reprotest to version 0.7.18 in GNU Guix.


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix unreproducible packages. We try to send all of our patches upstream where appropriate. We authored a large number of such patches this month, including:


Testing framework

The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:

  • Holger Levsen:

    • Debian-related changes:

      • Incorporate a fix from bremner into ‘builtin-pho’ related to binary-NMUs. []
      • Keep bullseye environments around longer, in an attempt to fix a Jenkins issue. []
      • Improve the documentation of buildinfos.debian.net. []
      • Improve documentation for the ‘builtin-pho’ setup. [][]
    • OpenWrt-related changes:

      • Also use -j1 for better debugging. []
      • Document that that Python 3.x is now used. []
      • Enable further debugging for the toolchain build. []
    • New snapshot.reproducible-builds.org service:

      • Actually add new node. [][]
      • Install xfsprogs on snapshot.reproducible-builds.org. []
      • Create account for fpierret on new node. []
      • Run node_health_check job on new node too. []
  • Mattia Rizzolo:

    • Debian-related changes:

      • Handle schroot errors when invoking diffoscope instead of masking them. [][]
      • Declare and define some variables separately to avoid masking the subshell return code. []
      • Fix variable name. []
      • Improve log reporting. []
      • Execute apt-get update with the -q argument to get more decent logs. []
      • Set the Debian HTTP mirror and proxy for snapshot.reproducible-builds.org. []
      • Install the libarchive-tools package (instead of bsdtar) when updating Jenkins nodes. []
    • Be stricter about errors when starting the node agent [] and don’t overwrite NODE_NAME so that we can expect Jenkins to properly set for us [].
    • Explicitly warn if the NODE_NAME is not a fully-qualified domain name (FQDN). []
    • Document whether a node runs in the future. []
    • Disable postgresql_autodoc as it not available in bullseye. []
    • Don’t be so eager when deleting schroot internals, call to schroot -e to terminate the schroots instead. []
    • Only consider schroot underlays for deletion that are over a month old. [][]
    • Only try to unmount /proc if it’s actually mounted. []
    • Move the db_backup task to its own Jenkins job. []

Lastly, Vasyl Gello added usage information to the reproducible_build.sh script [].


Contributing

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports