Reproducible Builds in November 2019

View all our monthly reports


Welcome to the November 2019 report from the Reproducible Builds project.

As a summary of our project, whilst anyone can inspect the source code of free software for malicious flaws almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

In this month’s report, we cover:

  • Media coverage and eventsEnter the Reproducibility Challenge, etc.
  • Upstream newsOCaml, Mes, Maven, etc.
  • Distribution workThe latest reports from Arch, Debian and openSUSE, etc.
  • Software developmentHoliday bonanza of patches, work on diffoscope, etc.
  • ContributingHow to get in touch…

If you are interested in contributing to our project, please visit our Contribute page on our website.


Media coverage and events

We held our fifth annual Reproducible Builds summit between the 1st and 8th December in Marrakesh, Morocco. A full, in-depth report will be posted next month…

On November 16th, Vagrant Cascadian presented There and Back Again, Reproducibly at the SeaGL in Seattle, Washington.

Chris Lamb was featured on The Manifest package management podcast in an episode called Reproducible Builds project and Debian package management.

ReScience C is an open-access journal that targets computational research and encourages the explicit replication of already published research. This month they announced their Ten Years Reproducibility Challenge which promotes the idea that old code — in this instance, a “scientific article [published] before January 1st 2010” — should also run on modern hardware and software in order to check one can obtain the same scientific results in the future.


Upstream news

Mike Hommey pushed a change to Mozilla build system to add and print error messages when differences are found between builds as requested in bug #1597903.

There was fresh activity on an old pull request for the OCaml programming language regarding the usage and adoption of the BUILD_PATH_PREFIX_MAP environment variable that is used to ensure that software packages do not embed build-time paths into generated files. On the pull request in question Gabriel Scherer was kind enough to provide many helpful examples on how to use the rewrite rules.

Jan Nieuwenhuizen announced the release of GNU Mes 0.21 and Jeremiah Orians announced the release of mescc-tools-seed version 1.1:

Capable of bootstrapping from a simple hex assembler all the way to a cross-platform C compiler Work is still ongoing [to] result in a full bootstrap from a 357 byte bootstrap binary all the way to GCC.

Hervé Boutemy announced the release of three base Apache Maven plugins (maven-source-plugin, maven-jar-plugin and maven-assembly-plugin 3.2.0) to get Reproducible Builds as a “direct output” from this build system. For more information, please see the “Configuring for Reproducible Builds” section of their documentation.

Eli Schwartz reported a bug against the GNU groff typesetting system for incomplete SOURCE_DATE_EPOCH environment variable support; the output files appeared to be embedding the build timezone.


Distribution work

Arch Linux

A slight but temporary decline in the Arch Linux reproducibility status was determined to be due to a bug in the continuous integration framework where one build was building with --nocheck whilst the other did not, resulting in the test dependencies being installed on one build. This led to differences in the BUILDINFO file which records the build dependencies.

Morten Linderud (Foxboron) wrote a blog post on the progress of reproducible builds for Arch packages, including how to reproduce packages and a roadmap of future of work.

The standard Arch development tools package (devtools) now contains a new tool called makerepropkg which can reproduce a package from the Arch repositories given a seed PKGBUILD file.

A lot of work has been put into getting the “[core]” system more reproducible; every package has been rebuilt with a new version of pacman which resolved a previous issue with storing the package size. Build failures and download issues have also been resolved which have lead to an increase of reproducible packages in this distributions continuous integration setup.

openSUSE

Bernhard M. Wiedemann posted a summary of openSUSE updates for 2019 including rpm, a high level openSUSE status and fixing problems with .pyc files which is also relevant to Arch Linux.

The report also summarises the current reproducibility status as follows:

In addition to this, Bernhard also published his monthly Reproducible Builds status update.

Debian

Thorsten Glaser filed a bug against the debhelper packaging library to request that it sets and exports a umask of 022 for all operations as a possible “harmonisation potential”. A varying umask can result in unreproducible packages as the file permissions on the build system can be embedded into archives generated by the build system.

Chris Lamb categorised a large number of packages and issues in the Reproducible Builds “notes” repository, including adding a new ocaml_dune_captures_build_path toolchain issue [].

Vagrant Cascadian filed a bug against the Lintian Debian static analyser for Debian packages to request that it checks for missing and/or unsigned .buildinfo files. He also uploaded the latest version of GNU Mes to the unstable distribution.

Other

Natanael Copa (@n_copa) posted on Twitter that he was finally able to make a fully reproducible package) for Alpine Linux.

The NixOS distribution announced that they plan to run a Christmas Hackathon hosted by Smarkets in London, England on 9th December.


Software development

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

diffoscope versions 131, 132 and 133 were uploaded to Debian unstable by Chris Lamb. He also made the following changes:

  • New features / improvements:
    • Allow all possible .zip file variations to return from external tools with non-zero exit codes, not just known types we can identify (e.g. Java .jmod and .jar files). (#78)
    • Limit .dsc and .buildinfo file matching to files in ASCII or UTF-8 format. (#77)
    • Bump the previous max_page_size limit from 400 kB to 4 MB. []
    • Clarify in the HTML and text outputs that the limits are per-format, not global. (#944882)
    • Don’t use line-based buffering when communicating with subprocesses in “binary” mode. (#75)
  • Regression fixes:
    • Correct the substitution/filtering of paths in ELF output to avoid unnecessary differences depending on the path name provided and commandline. (#945572)
    • Silence/correct a Python SyntaxWarning message due to incorrectly comparing an integer by identity vs. equality. (#945531)
  • Testsuite improvements:
    • Refresh the OCaml test fixtures to support versions greater than 4.08.1. []
    • Update an Android manifest test to reflect that parsed XML attributes are returned in a new/sorted manner under Python 3.8. []
    • Dramatically Truncate the tcpdump expected diff to 8KB from ~600KB to reduce the size of the release tarball. []
    • Add a self-test to encourage that new test data files are generated dynamically or at least no new ones are added without an explicit override. []
    • Add a comment that the text_ascii1 and text_ascii2 fixture files are used in multiple tests so is not trivial to remove/replace them. []
    • Drop two more test fixture files for the directory tests. []
    • Don’t run our self-test against the output of the Black source code reformatter with versions earlier than “ours” as it will generate different results. []
    • Update an XML test for Python 3.8. []
    • Drop unused an unused BASE_DIR global. []
  • Code improvements:
    • Rework a long string of or statements into a loop with a break. []
    • Update code to reflect the latest version of the Black source code reformatter. []
    • Correct a reference to the .rdx extension suffix in a comment. []

Other contributions were also made from:

  • Jelle van der Waa:
    • Add support for comparing .zst files created by Zstandard compression algorithm. (#34)
  • Mattia Rizzolo:
    • Install python3-all whilst running the autopkgtests as we want to run the tests against all supported Python versions. []
    • Use apt-get instead of apt in our Dockerfile. []
    • Add zstd to our test dependencies after the resolution of #34. []

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb added file as a dependency for libfile-stripnondeterminism-perl (#945212) and moved away from deprecated $ADTTMP variable [] and made two uploads in total (1.6.2-1 & 1.6.3-1).

Project website

There was yet more effort put into our our website this month, including:

Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Alexander Couzens (OpenWrt): Fix a typo in the kirkwood architecture. []

  • Holger Levsen:

    • Debian:
      • Display newer suites first on pages showing the oldest build results. []
      • Use the fully qualified-domain name (FQDN) when specifying hostnames in our list of offline nodes. []
      • Reflect that coccia.debian.org has changed IP address. []
      • Ignore the Maximum transmission Unit (MTU) on eth0 when checking for host health. []
      • Perform the “/usr merge” variation in the unstable, experimental and bullseye distributions but not on buster. []
    • FreeBSD: Upgrade the test VM to FreeBSD 12.1. []

    • Arch Linux:
      • Don’t fail build jobs if the call to diffoscope --version fails; be a bit more verbose in the job output instead. [][]
      • Attempt to be less error prone when ending schroot sessions. []
    • OpenWrt:
      • Additionally build the brcm47xx, kirkwood, lantiq, mediatek, omap, sunxi and tegra targets. [][]
      • Make build job outputs easier to read and thus understand. []
      • Include the build target and subtarget in summary paragraphs at the top of report pages. []
      • Add a reminder to fix the job URL later. []
    • Misc:
      • Attempt to fix the PureOS package set. []
      • Shorten a “HOWTO” header a tiny bit. []
      • Drop hack to fix the clock. []
      • Improve a script header; patches are even more welcome than bugs! []
      • Disable the use of the OpenSSH ControlMaster feature to prevent Jenkins killing connections. []
      • Make a number of improvements to our boilerplate texts/scripts. [][][]
  • Jelle van der Waa: Skip running the Arch Linux tests for continuous builds and rebuilds. [][]

  • Mattia Rizzolo:
    • Set the maximum size for HTML pages generated by diffoscope to 1MB (current default is 400 KB). [][]
    • Update and improve the backup routines for the email relay system managing reproducible-builds.org. [][]
  • Vagrant Cascadian:
    • Ensure OpenSSH authorized_keys files are processed in the correct directory regardless of where they are run from. []
    • Reduce the level of parallelism on armhf systems with a lot of cores to reduce swapping on highly parallel builds, additionally ensuring level of parallelism are odd and even numbers on the first and second builds respectfully. []

The usual node maintenance was performed by Holger Levsen. [][][][]


Contributing

If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:


This month’s report was written by Arnout Engelen, Chris Lamb, Holger Levsen, Jelle van der Waa, Bernhard M. Wiedemann and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports