Reproducible Builds in August 2019

View all our monthly reports


Welcome to the August 2019 report from the Reproducible Builds project!

In these monthly reports we outline the most important things that have happened in the world of Reproducible Builds and we have been up to.

As a quick recap of our project, whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed to end users or systems as precompiled binaries. The motivation behind the reproducible builds effort is to ensure zero changes have been introduced during these compilation processes. This is achieved by promising identical results are always generated from a given source thus allowing multiple third-parties to come to a consensus on whether a build was changed or even compromised.

In August’s month’s report, we cover:

  • Media coverage & eventsWebmin, CCCamp, etc.
  • Distribution workThe first fully-reproducible package sets, openSUSE update, etc
  • Upstream newslibfaketime updates, gzip, ensuring good definitions, etc.
  • Software developmentMore work on diffoscope, new variations in our testing framework, etc.
  • Misc newsFrom our mailing list, etc.
  • Getting in touchHow to contribute, etc

If you are interested in contributing to our project, please visit our Contribute page on our website.


Media coverage & events

A backdoor was found in Webmin a popular web-based application used by sysadmins to remotely manage Unix-based systems. Whilst more details can be found on upstream’s dedicated exploit page, it appears that the build toolchain was compromised. Especially of note is that the exploit “did not show up in any Git diffs” and thus would not have been found via an audit of the source code. The backdoor would allow a remote attacker to execute arbitrary commands with superuser privileges on the machine running Webmin. Once a machine is compromised, an attacker could then use it to launch attacks on other systems managed through Webmin or indeed any other connected system. Techniques such as reproducible builds can help detect exactly these kinds of attacks that can lay dormant for years. (LWN comments)

In a talk titled There and Back Again, Reproducibly! Holger Levsen and Vagrant Cascadian presented at the 2019 edition of the Linux Developer Conference in São Paulo, Brazil on Reproducible Builds.

LWN posted and hosted an interesting summary and discussion on Hardening the file utility for Debian. In July, Chris Lamb had cross-posted his reply to the “Re: file(1) now with seccomp support enabled” thread, originally started on the debian-devel mailing list. In this post, Chris refers to our strip-nondeterminism tool not being able to accommodate the additional security hardening in file(1) and the changes made to the tool in order to fix this issue which was causing a huge number of regressions in our testing framework.

The Chaos Communication Camp — an international, five-day open-air event for hackers that provides a relaxed atmosphere for free exchange of technical, social, and political ideas — hosted its 2019 edition where there were many discussions and meet-ups at least partly related to Reproducible Builds. This including the titular Reproducible Builds Meetup session which was attended by around twenty-five people where half of them were new to the project as well as a session dedicated to all Arch Linux related issues.


Distribution work

In Debian, the first “package sets” — ie. defined subsets of the entire archive — have become 100% reproducible including as the so-called “essential” set for the bullseye distribution on the amd64 and the armhf architectures. This is thanks to work by Chris Lamb on bash, readline and other low-level libraries and tools. Perl still has issues on i386 and arm64, however.

Dmitry Shachnev filed a bug report against the debhelper utility that speaks to issues around using the date from the debian/changelog file as the source for the SOURCE_DATE_EPOCH environment variable as this can lead to non-intuitive results when package is automatically rebuilt via so-called binary (NB. not “source”) NMUs. A related issue was later filed against qtbase5-dev by Helmut Grohne as this exact issue led to an issue with co-installability across architectures.

Lastly, 115 reviews of Debian packages were added, 45 were updated and 244 were removed this month, appreciably adding to our knowledge about identified issues. Many issue types were updated by Chris Lamb, including embeds_build_data_via_node_preamble, embeds_build_data_via_node_rollup, captures_build_path_in_beam_cma_cmt_files, captures_varying_number_of_build_path_directory_components (discussed later), timezone_specific_files_due_to_haskell_devscripts, etc.

Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. New issues were found from enabling Link Time Optimization (LTO) in this distribution’s Tumbleweed branch. This affected, for example, nvme-cli as well as perl-XML-Parser and pcc with packaging issues.


Upstream news


Software development

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. In August we wrote a large number of such patches, including:

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

This month, Chris Lamb made the following changes:

  • Improvements:
    • Don’t fallback to an unhelpful raw hexdump when, for example, readelf(1) reports an minor issue in a section in an ELF binary. For example, when the .frames section is of the NOBITS type its contents are apparently “unreliable” and thus readelf(1) returns 1. (#58, #931962)
    • Include either standard error or standard output (not just the latter) when an external command fails. []
  • Bug fixes:
    • Skip calls to unsquashfs when we are neither root nor running under fakeroot. (#63)
    • Ensure that all of our artificially-created subprocess.CalledProcessError instances have output instances that are bytes objects, not str. []
    • Correct a reference to parser.diff; diff in this context is a Python function in the module. []
    • Avoid a possible traceback caused by a str/bytes type confusion when handling the output of failing external commands. []
  • Testsuite improvements:

    • Test for 4.4 in the output of squashfs -version, even though the Debian package version is 1:4.3+git190823-1. []
    • Apply a patch from László Böszörményi to update the squashfs test output and additionally bump the required version for the test itself. (#62 & #935684)
    • Add the wabt Debian package to the test-dependencies so that we run the WebAssembly tests on our continuous integration platform, etc. []
  • Improve debugging:
    • Add the containing module name to the (eg.) “Using StaticLibFile for ...” debugging messages. []
    • Strip off trailing “original size modulo 2^32 671” (etc.) from gzip compressed data as this is just a symptom of the contents itself changing that will be reflected elsewhere. (#61)
    • Avoid a lack of space between “... with return code 1” and “Standard output”. []
    • Improve debugging output when instantantiating our Comparator object types. []
    • Add a literal “eg.” to the comment on stripping “original size modulo...” text to emphasise that the actual numbers are not fixed. []
  • Internal code improvements:
    • No need to parse the section group from the class name; we can pass it via type built-in kwargs argument. []
    • Add support to Difference.from_command_exc and friends to ignore specific returncodes from the called program and treat them as “no” difference. []
    • Simplify parsing of optional command_args argument to Difference.from_command_exc. []
    • Set long_description_content_type to text/x-rst to appease the PyPI.org linter. []
    • Reposition a comment regarding an exception within the indented block to match Python code convention. []

In addition, Mattia Rizzolo made the following changes:

  • Now that we install wabt, expect its tools to be available. []
  • Bump the Debian backport check. []

Lastly, Vagrant Cascadian updated diffoscope to versions 120, 121 and 122 in the GNU Guix distribution.

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb made the following changes.

  • Add support for enabling and disabling specific normalizers via the command line. (#10)
  • Drop accidentally-committed warning emitted on every fixture-based test. []
  • Reintroduce the .ar normalizer [] but disable it by default so that it can be enabled with --normalizers=+ar or similar. (#3)
  • In verbose mode, print the normalizers that strip-nondeterminism will apply. []

In addition, there was some movement on an issue in the Archive::Zip Perl module that strip-nondeterminism uses regarding the lack of support for bzip compression that was originally filed in 2016 by Andrew Ayer.

Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org.

This month Vagrant Cascadian suggested and subsequently implemented that we additionally test a varying build directory of different string lengths (eg. /path/to/123 vs /path/to/123456 but we also vary the number of directory components within this, eg. /path/to/dir vs. /path/to/parent/subdir. Curiously, whilst it was a priori believed that was rather unlikely to yield differences, Chris Lamb has managed to identify approximately twenty packages that are affected by this issue.

It was also noticed that our testing of the Coreboot free software firmware fails to build the toolchain since we switched to building on the Debian buster distribution. The last successful build was on August 7th but all newer builds have failed.

In addition, the following code changes were performed in the last month:

  • Chris Lamb: Ensure that the size the log for the second build in HTML pages was also correctly formatted (eg. “12KB” vs “12345”). []

  • Holger Levsen:

  • Mathieu Parent: Update the contact details for the Debian PHP Group. []

  • Mattia Rizzolo:

The usual node maintenance was performed by Holger Levsen [][] and Vagrant Cascadian [].


Misc news

There was a yet more effort put into our our website this month, including misc copyediting by Chris Lamb [], Mathieu Parent referencing his fix for php-pear [] and Vagrant Cascadian updating a link to his homepage. [].

On our mailing list this month Santiago Torres Arias started a Setting up a MS-hosted rebuilder with in-toto metadata thread regarding Microsoft’s interest in setting up a rebuilder for Debian packages touching on issues of transparency logs and the integration of in-toto by the Secure Systems Lab at New York University. In addition, Lars Wirzenius continued conversation regarding various questions about reproducible builds and their bearing on building a distributed continuous integration system.

Lastly, in a thread titled Reproducible Builds technical introduction tutorial Jathan asked whether anyone had some “easy” Reproducible Builds tutorials in slides, video or written document format.


Getting in touch

If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:



This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa, Mathieu Parent and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports