Welcome to the November 2019 report from the Reproducible Builds project.
As a summary of our project, whilst anyone can inspect the source code of free software for malicious flaws almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this month’s report, we cover:
- Media coverage and events — Enter the Reproducibility Challenge, etc.
- Upstream news — OCaml, Mes, Maven, etc.
- Distribution work — The latest reports from Arch, Debian and openSUSE, etc.
- Software development — Holiday bonanza of patches, work on diffoscope, etc.
- Contributing — How to get in touch…
If you are interested in contributing to our project, please visit our Contribute page on our website.
Media coverage and events
We held our fifth annual Reproducible Builds summit between the 1st and 8th December in Marrakesh, Morocco. A full, in-depth report will be posted next month…
Chris Lamb was featured on The Manifest package management podcast in an episode called Reproducible Builds project and Debian package management.
ReScience C is an open-access journal that targets computational research and encourages the explicit replication of already published research. This month they announced their Ten Years Reproducibility Challenge which promotes the idea that old code — in this instance, a “scientific article [published] before January 1st 2010” — should also run on modern hardware and software in order to check one can obtain the same scientific results in the future.
There was fresh activity on an old pull request for the OCaml programming language regarding the usage and adoption of the
BUILD_PATH_PREFIX_MAP environment variable that is used to ensure that software packages do not embed build-time paths into generated files. On the pull request in question Gabriel Scherer was kind enough to provide many helpful examples on how to use the rewrite rules.
Capable of bootstrapping from a simple hex assembler all the way to a cross-platform C compiler Work is still ongoing [to] result in a full bootstrap from a 357 byte bootstrap binary all the way to GCC.
Hervé Boutemy announced the release of three base Apache Maven plugins (maven-source-plugin, maven-jar-plugin and maven-assembly-plugin 3.2.0) to get Reproducible Builds as a “direct output” from this build system. For more information, please see the “Configuring for Reproducible Builds” section of their documentation.
A slight but temporary decline in the Arch Linux reproducibility status was determined to be due to a bug in the continuous integration framework where one build was building with
--nocheck whilst the other did not, resulting in the test dependencies being installed on one build. This led to differences in the
BUILDINFO file which records the build dependencies.
Morten Linderud (Foxboron) wrote a blog post on the progress of reproducible builds for Arch packages, including how to reproduce packages and a roadmap of future of work.
The standard Arch development tools package (
devtools) now contains a new tool called
makerepropkg which can reproduce a package from the Arch repositories given a seed
A lot of work has been put into getting the “
[core]” system more reproducible; every package has been rebuilt with a new version of
pacman which resolved a previous issue with storing the package size. Build failures and download issues have also been resolved which have lead to an increase of reproducible packages in this distributions continuous integration setup.
The report also summarises the current reproducibility status as follows:
In addition to this, Bernhard also published his monthly Reproducible Builds status update.
Thorsten Glaser filed a bug against the
debhelper packaging library to request that it sets and exports a
022 for all operations as a possible “harmonisation potential”. A varying
umask can result in unreproducible packages as the file permissions on the build system can be embedded into archives generated by the build system.
Vagrant Cascadian filed a bug against the Lintian Debian static analyser for Debian packages to request that it checks for missing and/or unsigned
.buildinfo files. He also uploaded the latest version of GNU Mes to the unstable distribution.
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Arnout Engelen:
Bernhard M. Wiedemann:
abseil-cpp(sort the output of
brp-check-suse(to strip link-time optimisation (LTO) data from
buzztrax(report a parallelism/nondeterminism issue from GTK-Doc)
cardpeek(fix a previous patch)
cecilia(strip date and time in a
maven-plugin-bundle(fix a Java date)
.zipissue, already filed upstream)
opencensus-cpp(sort the result of
OpenSC(generate consistent DocBook identifiers)
pcc(fix a build failure from LTO in
perl-HTTP-Cookies(fix a build failure in 2025)
pocl(report compile-time CPU detection)
python-oslo.reports(drop unnecessary files with randomness)
vim(report a build failure when built without parallelism)
- Various updates to the RPM package manager:
- Chris Lamb:
- #943954 filed against
- #943956 filed against
- #944131 filed against
- #944214 filed against
- #944520 filed against
- #944782 filed against
- #945105 filed against
- #945576 filed against
- #945822 filed against
- #943954 filed against
- Vagrant Cascadian:
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
133 were uploaded to Debian unstable by Chris Lamb. He also made the following changes:
- New features / improvements:
- Allow all possible
.zipfile variations to return from external tools with non-zero exit codes, not just known types we can identify (e.g. Java
.buildinfofile matching to files in ASCII or UTF-8 format. (#77)
- Bump the previous
max_page_sizelimit from 400 kB to 4 MB. […]
- Clarify in the HTML and text outputs that the limits are per-format, not global. (#944882)
- Don’t use line-based buffering when communicating with subprocesses in “binary” mode. (#75)
- Allow all possible
- Regression fixes:
- Testsuite improvements:
- Refresh the OCaml test fixtures to support versions greater than 4.08.1. […]
- Update an Android manifest test to reflect that parsed XML attributes are returned in a new/sorted manner under Python 3.8. […]
- Dramatically Truncate the tcpdump expected diff to 8KB from ~600KB to reduce the size of the release tarball. […]
- Add a self-test to encourage that new test data files are generated dynamically or at least no new ones are added without an explicit override. […]
- Add a comment that the
text_ascii2fixture files are used in multiple tests so is not trivial to remove/replace them. […]
- Drop two more test fixture files for the directory tests. […]
- Don’t run our self-test against the output of the Black source code reformatter with versions earlier than “ours” as it will generate different results. […]
- Update an XML test for Python 3.8. […]
- Drop unused an unused
- Code improvements:
Other contributions were also made from:
- Jelle van der Waa:
- Mattia Rizzolo:
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb added
file as a dependency for
libfile-stripnondeterminism-perl (#945212) and moved away from deprecated
$ADTTMP variable […] and made two uploads in total (
There was yet more effort put into our our website this month, including:
Hervé Boutemy added a link to Maven Guide to Configuring for Reproducible Builds to our JVM page. […]
- Display newer suites first on pages showing the oldest build results. […]
- Use the fully qualified-domain name (FQDN) when specifying hostnames in our list of offline nodes. […]
- Reflect that
coccia.debian.orghas changed IP address. […]
- Ignore the Maximum transmission Unit (MTU) on
eth0when checking for host health. […]
- Perform the “
/usrmerge” variation in the unstable, experimental and bullseye distributions but not on buster. […]
- Arch Linux:
- Attempt to fix the PureOS package set. […]
- Shorten a “HOWTO” header a tiny bit. […]
- Drop hack to fix the clock. […]
- Improve a script header; patches are even more welcome than bugs! […]
- Disable the use of the OpenSSH
ControlMasterfeature to prevent Jenkins killing connections. […]
- Make a number of improvements to our boilerplate texts/scripts. […][…][…]
- Mattia Rizzolo:
- Vagrant Cascadian:
- Ensure OpenSSH
authorized_keysfiles are processed in the correct directory regardless of where they are run from. […]
- Reduce the level of parallelism on
armhfsystems with a lot of cores to reduce swapping on highly parallel builds, additionally ensuring level of parallelism are odd and even numbers on the first and second builds respectfully. […]
- Ensure OpenSSH
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Arnout Engelen, Chris Lamb, Holger Levsen, Jelle van der Waa, Bernhard M. Wiedemann and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.