Reproducible Builds in January 2021

View all our monthly reports


Welcome to the report from the Reproducible Builds project for January 2021. In our reports we outline the most important things that have happened in the world of reproducible builds in the past month. If you are interested in contributing to the project, please visit our Contribute page on our website.


There has been further discussion in security circles around the recent ‘SolarWinds’ supply-chain attack (covered in our report last month. This month, however, David A. Wheeler posted an article on the Linux Foundation’s blog titled Preventing Supply Chain Attacks like SolarWinds.

Noting that “assuming a system can never be broken into is a failing strategy”, David continues:

In the longer term, I know of only one strong countermeasure for this kind of attack: verified reproducible builds. A “reproducible build” is a build that always produces the same outputs given the same inputs so that the build results can be verified. A verified reproducible build is a process where independent organizations produce a build from source code and verify that the built results come from the claimed source code. Almost all software today is not reproducible, but there’s work to change this.

In addition, Episode 101 of the Ubuntu Security Podcast also covered the SolarWinds hack in further detail.


Elsewhere, the Bootstrappable Builds project was covered in depth by Jake Edge on Linux Weekly News. Jake introduced this sister project as follows:

The Bootstrappable Builds project was started as an offshoot of the Reproducible Builds project during the latter’s 2016 summit in Berlin. A bootstrappable build takes the idea of reproducibility one step further, in some sense. The build of a target binary can be reproduced alongside the build of the tools required to do so. It is, conceptually, almost like building a house from a large collection of atoms of different elements.

[…]

Building software depends on the tools used to construct the binary, including compilers and build-automation tools, many of which depend on pre-existing binaries. Minimizing the reliance on opaque binaries for building our software ecosystem is the goal of the Bootstrappable Builds project.

The full article is available on the LWN website.


Outreachy is an initiative that funds three-month remote internships in free and open source software, with a focus and background on supporting diversity. The Reproducible Builds project is considering joining this round, and are seeking input and ideas for good proposals.

Examples of the kind of projects we are looking for include workflow changes, large refactoring work, new features of our tools, specific reproducibility fixes and so on. Ideas should fit in that sweet spot of requiring more time and energy than a weekend project, but are also not too complicated that they would take forever. For more information, please see Mattia’s announcement on our mailing list.

Software development

Debian

In recent months there has been preparatory work to enable the reproducible=+fixfilepath build flag by default; enabling this fixfilepath feature flag should fix reproducibility issues in an estimated 500-700 packages. In January, however, Guillem Jover uploaded dpkg version 1.20.6 to Debian unstable with this flag enabled. Although a bug (#979570) was subsequently filed by Lisandro Damián Nicanor Pérez Meyer with the initial intention of pausing this change due to a problem with the Qt toolkit, it was closed after extensive discussion.

In recent weeks, Holger Levsen has been re-uploading a large number of Debian packages in an attempt to ensure they all have a related .buildinfo file. Holger described his rationale and approach in a blog post in December titled On doing 540 no-source-change source-only uploads in two weeks. In January, however, Holger performed 2,940 of these uploads, resulting in the Debian bullseye being brought down down to eleven packages that lack these files (from over 3,500). Holger wrote about his progress on our mailing list, where he also describes how he intends to eliminate the remaining packages.

Lukas Puehringer, Frédéric Pierre and Holger Levsen collaborated to upload apt-transport-in-toto version 0.1.0 into the unstable distribution, and Lukas Puehringer prepared packages for in-toto version 1.0.0 and python-securesystemslib 0.18.0 to the unstable distribution.

35 reviews of Debian packages were added, 58 were updated and 49 were removed this month adding to our extensive knowledge about identified issues. Chris Lamb identified two issue categories, build_path_added_by_src2man_from_txt2man and nondeterminstic_todo_identifiers_in_documentation_generated_by_doxygen. Thorsten Glaser also added a new uid_and_gid_in_cmake-generated_pkzip issue type as well [].

Other distributions

Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE which mentions amongst other things that “4.10% of packages are not perfectly reproducible”.

Jelle van der Waa posted an overview of Arch Linux’s work on reproducible builds during 2020. Titled Arch Linux Reproducible Builds Progress 2020, it mentions (for example) that their rebuilderd tool has seen 13 releases since March 2020.

reprotest

reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, the following changes were made:

  • Frédéric Pierret:

    • Add an RPM spec file. [][][]
    • Improve the tests. [][]
    • Fix a number of deprecation warnings. []
    • Update .gitignore. []
    • Replace deprecated warn method in logging routines. []
    • Improve documentation on available verbosity values. []
    • Update README documentation for RPM support. []
  • Holger Levsen:

    • Upload to Debian unstable. []
  • Marek Marczykowski-Górecki:

    • When running continuous integration, don’t run reprotest on reprotest itself. []
    • Disable running tests on Python 3.8. []

diffoscope

diffoscope is our project in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it provides human-readable diffs from many kinds of binary format. This month, Chris Lamb made a large number of changes (including releasing version 164, version 165 and version 166):

  • New features:

    • Save sys.argv in our top-level temporary directory, in case it helps debug why temporary directories might not get cleaned up. []
    • Collapse the --acl and --xattr arguments into --extended-filesystem-attributes to cover all of these extended attributes, defaulting the new option to false (ie. to not check these expensive external calls). [][]
  • Bug fixes:

    • Explicitly remove our top-level temporary directory. (#981123)
    • Adjust the fuzzy matching threshold to ensure that we show more differences. [][]
    • Use magic.Magic() over the now-deprecated magic.open() compatibility interface. []
  • Output improvements:

    • Show the ‘fuzziness’ amount in percentage terms, not out of the rather-arbitrary ‘400’. []
    • Improve help text for the --exclude-directory-metadata argument. []
    • Wrap our external call to cmp(1) with a missing profiling point. []
    • Truncate jsondiff differences at 512 bytes, in case they consume the entire page. []
    • Improve the logging around fuzzy matching. []
  • Codebase improvements:

    • Clarify in a comment that __del__ is not always called in Python, so temporary directories are not necessarily removed the moment they go out of scope. []
    • Print the free space in our temporary directory when we create it, not from within diffoscope.main. []
    • Tidy the diffoscope.comparators.utils.fuzzy module. []
    • Add a note regarding the special ordering of test_all_tools_are_listed within that module. []

Other changes were made by:

  • Conrad Ratschan:

    • Add a comparator for U-Boot Flattened Image Tree (FIT) files. [][]
  • Dimitrios Apostolou:

    • Introduce the --no-acl and --no-xattr arguments (later collapsed to --extended-filesystem-attributes by Chris Lamb) to improve performance. []
    • Avoid calling the external stat command. []
    • Avoid invoking external diff command for short outputs that are identical. []
    • Log when the cmp command is spawned. []
    • Improve performance of the has_same_content routine by spawning cmp less frequently. []
    • Cleanup the FIFO files when our context manager exits. []
  • Mattia Rizzolo:

    • Add missing lipo and otool external tools, and add a test to make sure they are all listed. []
    • Fix a possible crash in the --list-debian-substvars command. []
    • Filter the content of the debian/*.substvars files. []
    • Ignore/hide the DeprecationWarning pertaining to the imp module deprecation as it comes from a 3rd-party library. []
    • Add a pytest.ini to explicitly generate JUnit’s xunit2 format. []
    • Override several Lintian warnings regarding prebuilt test binaries existing in the source tree. []

Other tools

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb ensured that the tool did not process unwritable files (printing a warning in this case) (#980356) as well as a number of codebase improvements including reflowing logic to make larger future changes easier. []

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, Chris Lamb updated the benchmarking tools to call a tool that will call stat(2) repeatedly [] and Frédéric Pierret added an RPM spec file [] as well as the ability to prepend flags in CXXFLAGS []. Holger Levsen uploaded these changes to Debian unstable as version 0.5.11-1. disorderfs was also featured on Hacker News.

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework

The Reproducible Builds project operates a large Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Hans-Christoph Steiner (F-Droid):

    • Include the correct path for debug files. []
    • Copy mystery files to /tmp for debugging purposes. []
    • Add a call to git info to debug why a directory is considered ‘dirty’. []
  • Holger Levsen:

    • Debian:

    • Detect diskspace issues on the main Jenkins node. []
    • In Arch Linux testing, drop support for .tar.xz as pkg.tar.zst has taken over. []
    • Create a preliminary README.txt for buildinfos.debian.net. []
    • Update hard-coded instance of 2020 - happy new year! []
  • Johannes Schauer:

    • Update the README.txt for the reproducible_pool_buildinfos.sh script. []
  • Mattia Rizzolo:

    • Use a lockfile to ensure builders do not start when not required. [][]
    • When powercycling arm64 nodes, use the unprivileged user instead of what is locally configured as ‘root’. []
    • Remove the “Static Analysis Utilities” Jenkins plugin. []
    • Update the deployment tool to set the correct HOME environment variable when running Git to avoid printing warnings. []
    • Perform a large number of Ubuntu-related configuration changes. [][][][][][][]
    • Update various host lists. [][]
    • (Re-)add some handy shortcuts. []

Lastly, build node maintenance was performed by Holger Levsen [][], Mattia Rizzolo [][][][][] and Vagrant Cascadian [][][].

Community news

Chris Lamb updated the main Reproducible Builds website and documentation including adding a missing image [] and updated a script to ignore commits that start with, for example, ‘2020 12’ when generating commit listings [].

On our mailing list this month, however:

Contact

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports