Reproducible Builds in January 2020

View all our monthly reports


Welcome to the January 2020 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to. In this month’s report, we cover:

  • Upstream news & event coverageReproducing the Telegram messenger, etc.
  • Software developmentUpdates and improvements to our tooling
  • Distribution workMore work in Debian, openSUSE & friends
  • Misc newsFrom our mailing list & how to get in touch etc.
What are reproducible builds?

Whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

If you are interested in contributing, please visit the Contribute page on our website.


Upstream news & event coverage

The Telegram messaging application has documented full instructions for verifying that its original source code is exactly the same code that is used to build the versions available on the Apple App Store and Google Play.

Reproducible builds were mentioned in a panel on Software Distribution with Sam Hartman, Richard Fontana, & Eben Moglen at the Software Freedom Law Center’s 15h Anniversary Fall Conference (at ~35m21s).

Vagrant Cascadian will present a talk at SCALE 18x in Pasadena, California on March 8th titled There and Back Again, Reproducibly.

Matt Graeber (@mattifestation) posted on Twitter that:

If you weren’t aware of the reason Portable Executable timestamps in Win 10 binaries were nonsensical, Raymond’s post explains the reason: to support reproducible builds.

… referencing an article by Raymond Chen from January 2018 which, amongst other things, mentions:

One of the changes to the Windows engineering system begun in Windows 10 is the move toward reproducible builds.

Jan Nieuwenhuizen announced the release of GNU Mes 0.22. Vagrant Cascadian subsequently uploaded this version to Debian which produced a bit-for-bit identical mescc-mes-static binary with the mes-rb5 package in GNU Guix.

Software development

diffoscope

diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour.

This month, diffoscope versions 135 and 136 were uploaded to Debian unstable by Chris Lamb. He also made the following changes to diffoscope itself, including:

  • New features:

    • Support external difference tools such as Meld, etc. similar to git-difftool(1). (#87)
    • Extract resources.arsc files as well as classes.dex from Android .apk files to ensure that we show the differences there. (#27)
    • Fallback to the regular .zip container format for .apk files if apktool is not available. [][][][]
    • Drop --max-report-size-child and --max-diff-block-lines-parent; scheduled for removal in January 2018. []
    • Append a comment to a difference if we fallback to a less-informative container format but we are missing a tool. [][]
  • Bug fixes:

    • No longer raise a KeyError exception if we request an invalid member from a directory container. []
  • Documentation/workflow improvements:

    • Clarify that “install X” in various outputs actually refers to system-level packages. []
    • Add a note to the Contributing documentation to suggest enable concurrency when running the tests locally. []
    • Include the CONTRIBUTING.md file in the PyPI.org release. [][]
  • Logging improvements:

    • Log a debug-level message if we cannot open a file as container due to a missing tool to assist in diagnosing issues. []
    • Correct a debug message related to compare_meta calls to quote the arguments correctly. []
    • Add the current PATH environment variable to the Normalising locale... debug-level message. []
    • Print the Starting diffoscope $VERSION line as the first line of the log as we are, well, starting diffoscope. []
    • If we don’t know the HTML output name, don’t emit an enigmatically truncated HTML output for debug message. []
  • Tests:

    • Don’t exhaustively output the entire HTML report when testing the regression for #875281; parsing the JSON and pruning the tree should be enough. (#84)
    • Refresh and update the fixtures for the .ico tests to match the latest version of Imagemagick in Debian unstable. []
  • Code improvements:

    • Add a .git-blame-ignore-revs file to improve the output of git-blame(1) by ignoring large changes when introducing the Black source code reformatter and update the CONTRIBUTING.md guide on how to optionally use it locally. []
    • Add a noqa line to avoid a false-positive Flake8 “unused import” warning. []
    • Move logo.svg to under the doc/ directory [] and make setup.py executable [].
    • Tidy diffoscope.main’s configure method. [][][][]
    • Drop an assertion that is guaranteed by parallel if conditional [] and an unused “Difference” import from the APK comparator. []
    • Turn down the “volume” for a recommendation in a comment. []
    • Rename the diffoscope.locale module to diffoscope.environ as we are modifying things beyond just the locale (eg. calling tzset, etc.) []
    • Factor-out the generation of foo not available in path comment messages into the exception that raises them [] and factor out running all of our many zipinfo into a new method [].
  • trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb fixed the PyPI.org release by adding the trydiffoscope script itself to the MANIFEST file and performing another release cycle. []

In addition, Marc Herbert adjusted the cbfstool tests to search for expected keywords in the output, rather than specific output [], fixed a misplaced debugging line [] and added a “Testing” section to the CONTRIBUTING.rst [] file. Vagrant Cascadian updated to diffoscope 135 in GNU Guix.

reprotest

reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.11 and 0.7.12 were uploaded to Debian unstable by Holger Levsen. This month, Iñaki Malerba improved the version test to split on the + character [] and Ross Vandegrift updated the code to allow the user to override timeouts from the surrounding environment [].

Holger Levsen also made the following additionally changes:

  • Drop the short timeout and use the install timeout instead. (#897442)
  • Use “real” reStructuredText comments instead of using the raw directive. []
  • Update the PyPI classifier to express we are using Python 3.7 now. []

Other tools

  • disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Chris Lamb fixed an issue by ignoring the return values of fsyncdir to ensure (for example) dpkg(1) can “flush” /var/lib/dpkg correctly [] and merged a change from Helmut Grohne to use the build architecture’s version of pkg-config to permit cross-architecture builds [].

  • strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, version 1.6.3-2 was uploaded to Debian unstable by Holger Levsen to bump the Standards-Version. []

Upstream development

The Reproducible Builds project detects, dissects and attempts to fix as many unreproducible packages as possible. Naturally, we endeavour to send all of our patches upstream. This month, we wrote another large number of such patches, including:


Distribution work

openSUSE

In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update and submitted the following bugs and patches:

Many Python packages were updated to avoid writing .pyc files with an embedded random path, including jupyter-jupyter-wysiwyg, jupyter-jupyterlab-latex, python-PsyLab, python-hupper, python-ipyevents (don’t rewrite .zip file), python-ipyleaflet, python-jupyter-require, python-jupyter_kernel_test, python-nbdime (do not rewrite .zip, avoid time-based .pyc), python-nbinteract, python-plaster, python-pythreejs, python-sidecar & tensorflow (use pip install --no-compile).

Debian

There was yet more progress towards making the Debian Installer images reproducible. Following-on from last months’ efforts, Chris Lamb requested a status update on the Debian bug in question.

Daniel Schepler posted to the debian-devel mailing list to ask whether “running dpkg-buildpackage manually from the command line” is supported, particularly with respect to having extra packages installed during the package was built either resulted in a failed build or even broken packages (eg. #948522, #887902, etc.). Our .buildinfo files could be one solution to this as they record the environment at the time of the package build.

Holger disabled scheduling of packages from the “oldstable” stretch release on tests.reproducible-builds.org. This is the first time since stretch’s existence that we are no longer testing this release.

OpenJDK, a free and open-source implementation of the Java Platform was updated in Debian to incorporate a number of patches from Emmanuel Bourg, including:

  • Make the generated character data source files reproducible. (#933339)
  • Make the generated module-info.java files reproducible. (#933342)
  • Make the generated copyright headers reproducible. (#933349)
  • Make the build user reproducible. (#933373)

83 reviews of Debian packages were added, 32 were updated and 96 were removed this month adding to our knowledge about identified issues. Many issue types were updated by Chris Lamb, including timestamp_in_casacore_tables, random_identifiers_in_epub_files_generated_by_asciidoc, nondeterministic_ordering_in_casacore_tables, captures_build_path_in_golang_compiler, captures_build_path_via_haskell_adddependentfile & png_generated_by_plantuml_captures_kernel_version_and_builddate`.

Lastly, Mattia Rizzolo altered the permissions and shared the notes.git repository which underpins the aforementioned package classifications with the entire “Debian” group on Salsa, therefore giving all DDs write access to it. This is an attempt to invite more direct contributions instead of merge requests.

Other distributions

The FreeBSD Project Tweeted that:

Reproducible builds are turned on by default for -RELEASE []

… which targets the next released version of this distribution (view revision). Daniel Ebdrup followed-up to note that this option:

Used to be turned on in -CURRENT when it was being tested, but it has been turned off now that there’s another branch where it’s used, whereas -CURRENT has more need to have the revision printed in uname (which is one of the things that make a build unreproducible). []

For Alpine Linux, Holger Levsen disabled the builders run by the Reproducible Builds project as our patch to the abuild utility (see December’s report doesn’t apply anymore and thus all builds have become unreproducible again. Subsequent to this, a patch was merged upstream. []

In GNU Guix, on January 14th, Konrad Hinsen posted a blog post entitled Reproducible computations with Guix which, amongst other things remarks that:

The [guix time-machine command] machine actually downloads the specified version of Guix and passes it the rest of the command line. You are running the same code again. Even bugs in Guix will be reproduced faithfully!

The Yocto Project reported that they have reproducible cross-built binaries that are independent of both the underlying host distribution the build is run on and independent of the path used for the build. This is now being continually tested on the Yocto Project’s automated infrastructure to ensure this state is maintained in the future.

Project website & documentation

There was more work performed on our website this month, including:

In addition, Arnout Engelen added a Scala programming language example for the SOURCE_DATE_EPOCH environment variable [], David del Amo updated the link to the Software Freedom Conversancy to remove some double parentheses [] and Peter Wu added a Debian example for the -ffile-prefix-map argument to support Clang version 10 [].

Testing framework

We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Adrian Bunk:
    • Use the et_EE locale/language instead of fr_CH. In Estonian, the z character is sorted between s and t which is contrary to common incorrect assumptions about the sorting order of ASCII characters.. []
    • Add ffile_prefix_map_passed_to_clang to the list of issues filtered as these build failures should be ignored. []
    • Remove the ftbfs_build_depends_not_available_on_amd64 from the list of filtered issues as this specific problem no longer exists. []
  • Holger Levsen:

    • Debian:
      • Always configure apt to ignore expired release files on hosts running in the future. []
      • Create an “oldsuites” page, showing suites we used to test in the past. [][][][][]
      • Schedule more old packages from the buster distribution. []
      • Deal with shell escaping and other options. [][][]
      • Reverse the suite ordering on the packages page. [][]
      • Show bullseye statistics on dashboard page, moving away from buster [] and additionally omit stretch [].
    • F-Droid:
      • Document the increased diskspace requirements; we require over 700 GiB now. []
    • Misc:
      • Gracefully deal with umount problems. [][]
      • Run code to show “todo” entries locally. []
      • Use mmdebstrap instead of debootstrap. [][][]
  • Jelle van der Waa (Arch Linux):

    • Set the PACKAGER variable to a valid string to avoid noise in the logging. []
    • Add a link to the Arch Linux-specific package page in the overview table. []
  • Mattia Rizzolo:
    • Fix a hard-coded reference to the current year. []
    • Ignore No server certificate defined warning messages when automatically parsing logfiles. []
  • Vagrant Cascadian special-cased u-boot on the armhf architecture: First, do not build the all architecture as the dependencies are not available on this architecture [] and also pass the --binary-arch argument to pbuilder too [].

The usual node maintenance was performed by Mattia Rizzolo [][], Vagrant Cascadian [][][][] and Holger Levsen.


Misc news

On our mailing list this month:

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can also get in touch with us via:



This month’s report was written by Arnout Engelen, Bernhard M. Wiedemann, Chris Lamb, heinrich5991, Holger Levsen, Jelle van der Waa, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports