Welcome to the October 2019 report from the Reproducible Builds project. 👌
In our monthly reports we attempt outline the most important things that we have been up to recently. As a reminder on what our little project is all about, whilst anyone can inspect the source code of free software for malicious changes most software is distributed to end users or servers as precompiled binaries. Reproducible builds tries to ensure that no changes have been made during these compilation processes by promising identical results are always generated from a given source, allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this month’s report, we will cover:
- Media coverage & conferences — Reproducible builds in Belfast & science
- Reproducible Builds Summit 2019 — Registration & attendees, etc.
- Distribution work — The latest work in Debian, OpenWrt, openSUSE, and more…
- Software development — More diffoscope development, etc.
- Getting in touch — How to contribute & get in touch
If you are interested in contributing to our venture, please visit our Contribute page on our website.
Media coverage & conferences
Whilst not strictly related to reproducible builds, Sean Gallagher from Ars Technica wrote an article entitled Researchers find bug in Python script may have affected hundreds of studies:
A programming error in a set of Python scripts commonly used for computational analysis of chemistry data returned varying results based on which operating system they were run on.
Reproducible Builds Summit 2019
Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. We are still seeking additional sponsorship for the event. Sponsoring enables us to enable the attendance of people who would not otherwise be able to attend. If you or your company would be able to sponsor the event, please contact
If you would like to learn more about the event and how to register, please visit our our dedicated event page.
GNU Guix announced that they had significantly reduced the size of their “bootstrap seed” by replacing binutils, GCC and glibc with smaller alternatives resulting in the package manager “possessing a formal description of how to build all underlying software” in a reproducible way from a mere 120MB seed.
OpenWrt is a Linux-based operating system targeting wireless network routers and other embedded devices. This month Paul Spooren (aparcar) posted a patch to their mailing list adding
KCFLAGS to the kernel build flags to make it easier to rebuild the official binaries.
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution which describes how
rpm was updated to run most builds with the
-flto=auto argument, saving mirror disk space/bandwidth. In addition,
maven-javadoc-plugin received a toolchain patch (originating from Debian) in order to normalise a date.
In Debian this month Didier Raboud (OdyX) started a discussion on the
debian-devel mailing list regarding building Debian source packages in a reproducible manner (thread index). In addition, Lukas Pühringer prepared an upload of
in-toto, a framework to protect supply chain integrity by the Secure Systems Lab at New York University which was uploaded by Holger Levsen.
Holger Levsen started a new section on the Debian wiki to centralise to document the progress made on various Debian-specific reproducibility issues and noticed that the “essential” package set in the bullseye distribution became unreproducible again, likely due to a a bug in Perl itself. Holger also restarted a discussion on Debian bug #774415 which requests that the
devscripts collection of utilities that “make the life of a Debian package maintainer easier” adds a script/wrapper to enable easier end-user testing of whether a package is reproducible.
Johannes Schauer (josch) explained that their
mmdebstrap tool can create bit-for-bit identical Debian chroots of the unstable and buster distributions for both the
minbase bootstrap “variants”, and Bernhard M. Wiedemann contributed to a discussion regarding adding a “global” build switch to enable/disable Profile-Guided Optimisation (PGO) and Link-time optimisation in the
dpkg-buildflags tool, nothing that “overall it is still very hard to get reproducible builds with PGO enabled.”
64 reviews of Debian packages were added, 10 were updated and 35 were removed this month adding to our knowledge about identified issues. Three new types were added by Chris Lamb (lamby):
Lastly, there was a far-reaching discussion regarding the correctness and suitability of setting the
TZ environment variable to
UTC when it was noted that the value
UTC0 was “technically” more correct.
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
sphinx-doc(nondeterminism from parallelism via Sphinx)
- A number of expiring SSL testing certificates have been extended to 2049 to fix future builds:
- Chris Lamb (lamby):
- #934698 filed against
- #941714 filed against
- #941715 filed against
- #941716 filed against
- #942005 filed against
- #942006 filed against
- #942009 filed against
- #942342 filed against
- #942479 filed against
- #942767 filed against
- #942847 filed against
- #942848 filed against
- #942867 & #942870 filed against
nodocDebian build profiles).
- #943471 filed against
- #943674 filed against
- #943694 filed against
- #943829 filed against
- #943954 filed against
- #943956 filed against
- #934698 filed against
- Mattias Ellert:
Lastly, a request from Steven Engler to sort fields in the
PKG-INFO files generated by the setuptools Python module build utilities was resolved by Jason R. Coombs and Vagrant Cascadian added
SOURCE_DATE_EPOCH support to LTSP’s manual page generation.
strip-nondeterminism & reprotest
strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Chris Lamb made a number of changes including uploading version
1.6.1-1 was to Debian unstable. This dropped the
bug_803503.zip test fixture as it is no longer compatible with the latest version of Perl’s
Archive::Zip module (#940973).
reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Iñaki Malerba updated our Salsa CI scripts […] as well as adding a
--control-build parameter […]. Holger Levsen uploaded the package as
0.7.10, bumping the Debian “standards version” to
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb (lamby) made the following changes, including uploading versions
129 to the Debian unstable distribution:
Disassembling and reporting on files related to the R (programming language):
- Expose an
.rdbfile’s absolute paths in the semantic/human-readable output, not hidden deep in a hexdump. […]
- Rework and refactor the handling of
.rdbfiles with respect to locating the parallel
.rdxprior to inspecting the file to ensure that we do not add files to the user’s filesystem in the case of directly comparing two
.rdbfiles or — worse — overwriting a file in is place. […]
- Query the container for the full path of the parallel
.rdxfile to the
.rdbfile as well as looking in the same directory. This ensures that comparing two Debian packages shows any varying path. […]
- Correct the matching of
.rdsfiles by also detecting newer versions of this file format. […]
- Don’t read the site and user environment when comparing
.rdsfiles by using
- Ensure all object names are displayed, including ones beginning with a fullstop (
.) […] and sort package fields when dumping data from
- Mask/hide standard error when processing
.rdbfiles […] and don’t include useless/misleading
NULLwhen dumping data from them. […]
- Format package contents as
foo = barrather than using ugly and misleading brackets, etc. […] and include the object’s type […].
- Don’t pass our long script to parse
.rdbfiles via the command line; use standard input instead. […]
- Call the
deparsefunction to ensure that we do not error out and revert to a binary diff when processing
.rdbfiles with internal “vector” types; they do not automatically coerce to strings. […]
- Other misc/cosmetic changes. […][…][…]
- Expose an
- When printing an error from a command, format the command for the user. […]
- Truncate very long command lines when displaying them as an external source of data. […]
- When formatting command lines ensure newlines and other metacharacters appear escaped as
\n, etc. […][…]
- When displaying the standard error from commands, ensure we use the escaped version. […]
- Use “exit code” over “return code” terminology when referring to UNIX error codes in displayed differences. […]
- Internal API:
- Add ability to pass bytestring input to external commands. […]
- Split out command-line formatting into a separate utility function. […]
- Add support for easily masking the standard error of commands. […][…]
- To match the libarchive container, raise a
KeyErrorexception if we request an invalid member from a directory. […]
- Correct string representation output in the traceback when we cannot locate a specific item in a container. […]
- Move build-dependency on
python-argcompleteto its Python 3 equivalent to facilitate Python 2.x removal. (#942967)
- Track and report on missing Python modules. (#72)
- Move from deprecated
$AUTOPKGTEST_TMPin the autopkgtests. […]
- Truncate the tcpdump expected diff to 8KB (from ~600KB). […]
- Try and ensure that new test data files are generated dynamically, ie. at least no new ones are added without “good” reasons. […]
- Drop unused
BASE_DIRglobal in the tests. […]
- Move build-dependency on
In addition, Mattia Rizzolo updated our tests to run against all supported Python versions […] and to exit with a UNIX exit status of
2 instead of
1 in case of running out of disk space […]. Lastly Vagrant Cascadian updated diffoscope 126 and 129 in GNU Guix, and updated inputs for additional test suite coverage.
trydiffoscope is the web-based version of diffoscope and this month Chris Lamb migrated the tool to depend on the
python3-docutils package over
python-docutils to allow for Python 2.x removal (#943293) as well as updating the packaging to the latest Debian standards and conventions […][…][…].
- Holger Levsen:
- Debian-specific changes:
- OpenWrt changes:
- Mattia Rizzolo:
- Paul Spooren (OpenWrt):
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levesen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.