Reproducible Builds in May 2020

View all our monthly reports


Welcome to the May 2020 report from the Reproducible Builds project.

One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. Nonetheless, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

News

The Corona-Warn app that helps trace infection chains of SARS-CoV-2/COVID-19 in Germany had a feature request filed against it that it build reproducibly.

A number of academics from Cornell University have published a paper titled Backstabber’s Knife Collection which reviews various open source software supply chain attacks:

Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle.

In related news, the LineageOS Android distribution announced that a hacker had access to the infrastructure of their servers after exploiting an unpatched vulnerability.

Marcin Jachymiak of the Sia decentralised cloud storage platform posted on their blog that their siac and siad utilities can now be built reproducibly:

This means that anyone can recreate the same binaries produced from our official release process. Now anyone can verify that the release binaries were created using the source code we say they were created from. No single person or computer needs to be trusted when producing the binaries now, which greatly reduces the attack surface for Sia users.

Synchronicity is a distributed build system for Rust build artifacts which have been published to crates.io. The goal of Synchronicity is to provide a distributed binary transparency system which is independent of any central operator.

The Comparison of Linux distributions article on Wikipedia now features a Reproducible Builds column indicating whether distributions approach and progress towards achieving reproducible builds.


Distribution work

In Debian this month:

In Alpine Linux, an issue was filed — and closed — regarding the reproducibility of .apk packages.

Allan McRae of the ArchLinux project posted their third Reproducible builds progress report to the arch-dev-public mailing list which includes the following call for help:

We also need help to investigate and fix the packages that fail to reproduce that we have not investigated as of yet.

In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.


Software development

diffoscope

Chris Lamb made the changes listed below to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. He also prepared and uploaded versions 142, 143, 144, 145 and 146 to Debian, PyPI, etc.

  • Comparison improvements:

    • Improve fuzzy matching of JSON files as file now supports recognising JSON data. (#106)
    • Refactor .changes and .buildinfo handling to show all details (including the GnuPG header and footer components) even when referenced files are not present. (#122)
    • Use our BuildinfoFile comparator (etc.) regardless of whether the associated files (such as the orig.tar.gz and the .deb) are present. []
    • Include GnuPG signature data when comparing .buildinfo, .changes, etc. []
    • Add support for printing Android APK signatures via apksigner(1). (#121)
    • Identify “iOS App Zip archive data” as .zip files. (#116)
    • Add support for Apple Xcode .mobilepovision files. (#113)
  • Bug fixes:

    • Don’t print a traceback if we pass a single, missing argument to diffoscope (eg. a JSON diff to re-load). []
    • Correct differences typo in the ApkFile handler. (#127)
  • Output improvements:

    • Never emit the same id="foo" anchor reference twice in the HTML output, otherwise identically-named parts will not be able to linked to via a #foo anchor. (#120)
    • Never emit an empty “id” anchor either; it is not possible to link to #. []
    • Don’t pretty-print the output when using the --json presenter; it will usually be too complicated to be readable by the human anyway. []
    • Use the SHA256 over MD5 hash when generating page names for the HTML directory-style presenter. (#124)
  • Reporting improvements:

    • Clarify the message when we truncate the number of lines to standard error [] and reduce the number of maximum lines printed to 25 as usually the error is obvious by then [].
    • Print the amount of free space that we have available in our temporary directory as a debugging message. []
    • Clarify Command […] failed with exit code messages to remove duplicate exited with exit but also to note that diffoscope is interpreting this as an error. []
    • Don’t leak the full path of the temporary directory in Command […] exited with 1 messages. (#126)
    • Clarify the warning message when we cannot import the debian Python module. []
    • Don’t repeat stderr from {} if both commands emit the same output. []
    • Clarify that an external command emits for both files, otherwise it can look like we are repeating itself when, in reality, it is being run twice. []
  • Testsuite improvements:

    • Prevent apksigner test failures due to lack of binfmt_misc, eg. on Salsa CI and elsewhere. []
    • Drop .travis.yml as we use Salsa instead. []
  • Dockerfile improvements:

    • Add a .dockerignore file to whitelist files we actually need in our container. (#105)
    • Use ARG instead of ENV when setting up the DEBIAN_FRONTEND environment variable at runtime. (#103)
    • Run as a non-root user in container. (#102)
    • Install/remove the build-essential during build so we can install the recommended packages from Git. []
  • Codebase improvements:

    • Bump the officially required version of Python from 3.5 to 3.6. (#117)
    • Drop the (default) shell=False keyword argument to subprocess.Popen so that the potentially-unsafe shell=True is more obvious. []
    • Perform string normalisation in Black [] and include the Black output in the assertion failure too [].
    • Inline MissingFile’s special handling of deb822 to prevent leaking through abstract layers. [][]
    • Allow a bare try/except block when cleaning up temporary files with respect to the flake8 quality assurance tool. []
    • Rename in_dsc_path to dsc_in_same_dir to clarify the use of this variable. []
    • Abstract out the duplicated parts of the debian_fallback class [] and add descriptions for the file types. []
    • Various commenting and internal documentation improvements. [][]
    • Rename the Openssl command class to OpenSSLPKCS7 to accommodate other command names with this prefix. []
  • Misc:

    • Rename the --debugger command-line argument to --pdb. []
    • Normalise filesystem stat(2) “birth times” (ie. st_birthtime) in the same way we do with the stat(1) command’s Access: and Change: times to fix a nondeterministic build failure in GNU Guix. (#74)
    • Ignore case when ordering our file format descriptions. []
    • Drop, add and tidy various module imports. [][][][]

In addition:

  • Jean-Romain Garnier fixed a general issue where, for example, LibarchiveMember’s has_same_content method was called regardless of the underlying type of file. []

  • Daniel Fullmer fixed an issue where some filesystems could only be mounted read-only. (!49)

  • Emanuel Bronshtein provided a patch to prevent a build of the Docker image containing parts of the build’s. (#123)

  • Mattia Rizzolo added an entry to debian/py3dist-overrides to ensure the rpm-python module is used in package dependencies (#89) and moved to using the new execute_after_* and execute_before_* Debhelper rules [].


Chris Lamb also performed a huge overhaul of diffoscope’s website:

  • Add a completely new design. [][]
  • Dynamically generate our contributor list [] and supported file formats [] from the main Git repository.
  • Add a separate, canonical page for every new release. [][][]
  • Generate a ‘latest release’ section and display that with the corresponding date on the homepage. []
  • Add an RSS feed of our releases [][][][][] and add to Planet Debian [].
  • Use Jekyll’s absolute_url and relative_url where possible [][] and move a number of configuration variables to _config.yml [][].


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Other tools

Elsewhere in our tooling:

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In May, Chris Lamb uploaded version 1.8.1-1 to Debian unstable and Bernhard M. Wiedemann fixed an “off-by-one” error when parsing PNG image modification times. (#16)

In disorderfs, our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues, Chris Lamb replaced the term “dirents” in place of “directory entries” in human-readable output/log messages [] and used the astyle source code formatter with the default settings to the main disorderfs.cpp source file [].

Holger Levsen bumped the debhelper-compat level to 13 in disorderfs [] and reprotest [], and for the GNU Guix distribution Vagrant Cascadian updated the versions of disorderfs to version 0.5.10 [] and diffoscope to version 145 [].

Project documentation & website

  • Carl Dong:

  • Chris Lamb:

    • Rename the Who page to Projects”. []
    • Ensure that Jekyll enters the _docs subdirectory to find the _docs/index.md file after an internal move. (#27)
    • Wrap ltmain.sh etc. in preformatted quotes. []
    • Wrap the SOURCE_DATE_EPOCH Python examples onto more lines to prevent visual overflow on the page. []
    • Correct a “preferred” spelling error. []
  • Holger Levsen:

    • Sort our Academic publications page by publication year [] and add “Trusting Trust” and “Fully Countering Trusting Trust through Diverse Double-Compiling” [].
  • Juri Dispan:

Testing framework

We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced. Holger Levsen made the following changes:

  • System health status:

    • Improve page description. []
    • Add more weight to proxy failures. []
    • More verbose debug/failure messages. [][][]
    • Work around strangeness in the Bash shell — let VARIABLE=0 exits with an error. []
  • Debian:

    • Fail loudly if there are more than three .buildinfo files with the same name. []
    • Fix a typo which prevented /usr merge variation on Debian unstable. []
    • Temporarily ignore PHP’s horde](https://www.horde.org/) packages in Debian bullseye. []
    • Document how to reboot all nodes in parallel, working around molly-guard. []
  • Further work on a Debian package rebuilder:

    • Workaround and document various issues in the debrebuild script. [][][][]
    • Improve output in the case of errors. [][][][]
    • Improve documentation and future goals [][][][], in particular documentiing two real world tests case for an “impossible to recreate build environment” [].
    • Find the right source package to rebuild. []
    • Increase the frequency we run the script. [][][][]
    • Improve downloading and selection of the sources to build. [][][]
    • Improve version string handling.. []
    • Handle build failures better. []. []. []
    • Also consider “architecture all” .buildinfo files. [][]

In addition:

  • kpcyrd, for Alpine Linux, updated the alpine_schroot.sh script now that a patch for abuild had been released upstream. []

  • Alexander Couzens of the OpenWrt project renamed the brcm47xx target to bcm47xx. []

  • Mattia Rizzolo fixed the printing of the build environment during the second build [][][] and made a number of improvements to the script that deploys Jenkins across our infrastructure [][][].

Lastly, Vagrant Cascadian clarified in the documentation that you need to be user jenkins to run the blacklist command [] and the usual build node maintenance was performed by Holger Levsen [][][], Mattia Rizzolo [][] and Vagrant Cascadian [][][].


Mailing list:

There were a number of discussions on our mailing list this month:

Paul Spooren started a thread titled Reproducible Builds Verification Format which reopens the discussion around a schema for sharing the results from distributed rebuilders:

To make the results accessible, storable and create tools around them, they should all follow the same schema, a reproducible builds verification format. The format tries to be as generic as possible to cover all open source projects offering precompiled source code. It stores the rebuilder results of what is reproducible and what not.

Hans-Christoph Steiner of the Guardian Project also continued his previous discussion regarding making our website translatable.

Lastly, Leo Wandersleb posted a detailed request for feedback on a question of supply chain security and other issues of software review; Leo is the founder of the Wallet Scrutiny project which aims to prove the security of Android Bitcoin Wallets:

Do you own your Bitcoins or do you trust that your app allows you to use “your” coins while they are actually controlled by “them”? Do you have a backup? Do “they” have a copy they didn’t tell you about? Did anybody check the wallet for deliberate backdoors or vulnerabilities? Could anybody check the wallet for those?

Elsewhere, Leo had posted instructions on his attempts to reproduce the binaries for the BlueWallet Bitcoin wallet for iOS and Android platforms.




If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:


This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports