Welcome to the October 2019 report from the Reproducible Builds project. đź‘Ś
In our monthly reports we attempt outline the most important things that we have been up to recently. As a reminder on what our little project is all about, whilst anyone can inspect the source code of free software for malicious changes most software is distributed to end users or servers as precompiled binaries. Reproducible builds tries to ensure that no changes have been made during these compilation processes by promising identical results are always generated from a given source, allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this month’s report, we will cover:
- Media coverage & conferences — Reproducible builds in Belfast & science
- Reproducible Builds Summit 2019 — Registration & attendees, etc.
- Distribution work — The latest work in Debian, OpenWrt, openSUSE, and more…
- Software development — More diffoscope development, etc.
- Getting in touch — How to contribute & get in touch
If you are interested in contributing to our venture, please visit our Contribute page on our website.
Media coverage & conferences
Jonathan McDowell gave an introduction on Reproducible Builds in Debian at the Belfast Linux User Group:
Whilst not strictly related to reproducible builds, Sean Gallagher from Ars Technica wrote an article entitled Researchers find bug in Python script may have affected hundreds of studies:
A programming error in a set of Python scripts commonly used for computational analysis of chemistry data returned varying results based on which operating system they were run on.
Reproducible Builds Summit 2019
Registration for our fifth annual Reproducible Builds summit that will take place between the 1st and 8th December in Marrakesh, Morocco has opened and invitations have been sent out.
Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. We are still seeking additional sponsorship for the event. Sponsoring enables us to enable the attendance of people who would not otherwise be able to attend. If you or your company would be able to sponsor the event, please contact info@reproducible-builds.org
.
If you would like to learn more about the event and how to register, please visit our our dedicated event page.
Distribution work
GNU Guix announced that they had significantly reduced the size of their “bootstrap seed” by replacing binutils, GCC and glibc with smaller alternatives resulting in the package manager “possessing a formal description of how to build all underlying software” in a reproducible way from a mere 120MB seed.
OpenWrt is a Linux-based operating system targeting wireless network routers and other embedded devices. This month Paul Spooren (aparcar) posted a patch to their mailing list adding KCFLAGS
to the kernel build flags to make it easier to rebuild the official binaries.
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution which describes how rpm
was updated to run most builds with the -flto=auto
argument, saving mirror disk space/bandwidth. In addition, maven-javadoc-plugin
received a toolchain patch (originating from Debian) in order to normalise a date.
Debian
In Debian this month Didier Raboud (OdyX) started a discussion on the debian-devel
mailing list regarding building Debian source packages in a reproducible manner (thread index). In addition, Lukas PĂĽhringer prepared an upload of in-toto
, a framework to protect supply chain integrity by the Secure Systems Lab at New York University which was uploaded by Holger Levsen.
Holger Levsen started a new section on the Debian wiki to centralise to document the progress made on various Debian-specific reproducibility issues and noticed that the “essential” package set in the bullseye distribution became unreproducible again, likely due to a a bug in Perl itself. Holger also restarted a discussion on Debian bug #774415 which requests that the devscripts
collection of utilities that “make the life of a Debian package maintainer easier” adds a script/wrapper to enable easier end-user testing of whether a package is reproducible.
Johannes Schauer (josch) explained that their mmdebstrap
tool can create bit-for-bit identical Debian chroots of the unstable and buster distributions for both the essential
and minbase
bootstrap “variants”, and Bernhard M. Wiedemann contributed to a discussion regarding adding a “global” build switch to enable/disable Profile-Guided Optimisation (PGO) and Link-time optimisation in the dpkg-buildflags
tool, nothing that “overall it is still very hard to get reproducible builds with PGO enabled.”
64 reviews of Debian packages were added, 10 were updated and 35 were removed this month adding to our knowledge about identified issues. Three new types were added by Chris Lamb (lamby): nondeterministic_output_in_code_generated_by_ros_genpy
, nondeterministic_ordering_in_include_graphs_generated_by_doxygen
& nondeterministic_defaults_in_documentation_generated_by_python_traitlets
.
Lastly, there was a far-reaching discussion regarding the correctness and suitability of setting the TZ
environment variable to UTC
when it was noted that the value UTC0
was “technically” more correct.
Software development
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
keeperrl
(merged, date)sphinx-doc
(nondeterminism from parallelism via Sphinx)vlc
(sort tar)- A number of expiring SSL testing certificates have been extended to 2049 to fix future builds:
python-M2Crypto
python-aiosmtplib
python-distlib
python-geventhttpclient
python-moto
(has a remaining year 2038 bug)python-oslo.service
python-thriftpy2
- Chris Lamb (lamby):
- #934698 filed against
libchamplain
(merged upstream). - #941714 filed against
bst-external
. - #941715 filed against
checkinstall
. - #941716 filed against
gobject-introspection
. - #942005 filed against
elph
. - #942006 filed against
squeak-plugins-scratch
. - #942009 filed against
stgit
(forwarded upstream). - #942342 filed against
traitlets
(forwarded upstream). - #942479 filed against
frobby
. - #942767 filed against
python-oslo.reports
. - #942847 filed against
cloudkitty
. - #942848 filed against
designate
. - #942867 & #942870 filed against
r-base
(not respectingnocheck
andnodoc
Debian build profiles). - #943471 filed against
khard
(forwarded upstream). - #943674 filed against
flask
(forwarded upstream). - #943694 filed against
ros-genpy
(forwarded upstream). - #943829 filed against
pmemkv
. - #943954 filed against
tm-align
. - #943956 filed against
snakemake
(forwarded upstream) spirv-tools
.
- #934698 filed against
- Mattias Ellert:
Lastly, a request from Steven Engler to sort fields in the PKG-INFO
files generated by the setuptools Python module build utilities was resolved by Jason R. Coombs and Vagrant Cascadian added SOURCE_DATE_EPOCH
support to LTSP’s manual page generation.
strip-nondeterminism & reprotest
strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Chris Lamb made a number of changes including uploading version 1.6.1-1
was to Debian unstable. This dropped the bug_803503.zip
test fixture as it is no longer compatible with the latest version of Perl’s Archive::Zip
module (#940973).
reprotest
is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Iñaki Malerba updated our Salsa CI scripts […] as well as adding a --control-build
parameter […]. Holger Levsen uploaded the package as 0.7.10
, bumping the Debian “standards version” to 4.4.1
 […].
diffoscope
diffoscope
is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb (lamby) made the following changes, including uploading versions 126
, 127
, 128
and 129
to the Debian unstable distribution:
-
Disassembling and reporting on files related to the R (programming language):
- Expose an
.rdb
file’s absolute paths in the semantic/human-readable output, not hidden deep in a hexdump. […] - Rework and refactor the handling of
.rdb
files with respect to locating the parallel.rdx
prior to inspecting the file to ensure that we do not add files to the user’s filesystem in the case of directly comparing two.rdb
files or — worse — overwriting a file in is place. […] - Query the container for the full path of the parallel
.rdx
file to the.rdb
file as well as looking in the same directory. This ensures that comparing two Debian packages shows any varying path. […] - Correct the matching of
.rds
files by also detecting newer versions of this file format. […] - Don’t read the site and user environment when comparing
.rdx
,.rdb
or.rds
files by usingRscript
’s--vanilla
option. […][…] - Ensure all object names are displayed, including ones beginning with a fullstop (
.
) […] and sort package fields when dumping data from.rdb
files […]. - Mask/hide standard error when processing
.rdb
files […] and don’t include useless/misleadingNULL
when dumping data from them. […] - Format package contents as
foo = bar
rather than using ugly and misleading brackets, etc. […] and include the object’s type […]. - Don’t pass our long script to parse
.rdb
files via the command line; use standard input instead. […] - Call the
deparse
function to ensure that we do not error out and revert to a binary diff when processing.rdb
files with internal “vector” types; they do not automatically coerce to strings. […] - Other misc/cosmetic changes. […][…][…]
- Expose an
- Output/logging:
- When printing an error from a command, format the command for the user. […]
- Truncate very long command lines when displaying them as an external source of data. […]
- When formatting command lines ensure newlines and other metacharacters appear escaped as
\n
, etc. […][…] - When displaying the standard error from commands, ensure we use the escaped version. […]
- Use “exit code” over “return code” terminology when referring to UNIX error codes in displayed differences. […]
- Internal API:
- Add ability to pass bytestring input to external commands. […]
- Split out command-line formatting into a separate utility function. […]
- Add support for easily masking the standard error of commands. […][…]
- To match the libarchive container, raise a
KeyError
exception if we request an invalid member from a directory. […] - Correct string representation output in the traceback when we cannot locate a specific item in a container. […]
- Misc:
- Move build-dependency on
python-argcomplete
to its Python 3 equivalent to facilitate Python 2.x removal. (#942967) - Track and report on missing Python modules. (#72)
- Move from deprecated
$ADTTMP
to$AUTOPKGTEST_TMP
in the autopkgtests. […] - Truncate the tcpdump expected diff to 8KB (from ~600KB). […]
- Try and ensure that new test data files are generated dynamically, ie. at least no new ones are added without “good” reasons. […]
- Drop unused
BASE_DIR
global in the tests. […]
- Move build-dependency on
In addition, Mattia Rizzolo updated our tests to run against all supported Python versions […] and to exit with a UNIX exit status of 2
instead of 1
in case of running out of disk space […]. Lastly Vagrant Cascadian updated diffoscope 126 and 129 in GNU Guix, and updated inputs for additional test suite coverage.
trydiffoscope is the web-based version of diffoscope and this month Chris Lamb migrated the tool to depend on the python3-docutils
package over python-docutils
to allow for Python 2.x removal (#943293) as well as updating the packaging to the latest Debian standards and conventions […][…][…].
Project website
There was yet more effort put into our our website this month, including Chris Lamb improving the formatting of reports  […][…][…][…][…] and tidying the new “Testing framework” links […], etc.
In addition, Holger Levsen add the Tor Project’s Reproducible Builds Manager to our “Who is Involved?” page and Mattia Rizzolo dropped a literal <br>
HTML element […].
Test framework
We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:
- Holger Levsen:
- Mattia Rizzolo:
- Paul Spooren (OpenWrt):
The usual node maintenance was performed by Holger Levsen […][…], Mattia Rizzolo […][…][…] and Vagrant Cascadian […][…][…].
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
Mailing list:
rb-general@lists.reproducible-builds.org
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levesen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.