Welcome to the September 2019 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. As a quick refresher of what our project is about, whilst anyone can inspect the source code of free software for malicious changes, most software is distributed to end users or servers as precompiled binaries. The motivation behind the reproducible builds effort is to ensure zero changes have been introduced during these compilation processes. This is achieved by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In September’s report, we cover:
- Media coverage & events — more presentations, preventing Stuxnet, etc.
- Upstream news — kernel reproducibility, grafana, systemd, etc.
- Distribution work — reproducible images in Arch Linux, policy changes in Debian, etc.
- Software development — yet more work on diffoscope, upstream patches, etc.
- Misc news & getting in touch — from our mailing list how to contribute, etc
If you are interested in contributing to our project, please visit our Contribute page on our website.
Media coverage & events
This month Vagrant Cascadian attended the 2019 GNU Tools Cauldron in Montréal, Canada and gave a presentation entitled Reproducible Toolchains for the Win (video).
In addition, our project was highlighted as part of a presentation by Andrew Martin at the All Systems Go conference in Berlin titled Rootless, Reproducible & Hermetic: Secure Container Build Showdown, and Björn Michaelsen from the Document Foundation presented at the 2019 LibreOffice Conference in Almería in Spain on the status of reproducible builds in the LibreOffice office suite.
In academia, Anastasis Keliris and Michail Maniatakos from the New York University Tandon School of Engineering published a paper titled ICSREF: A Framework for Automated Reverse Engineering of Industrial Control Systems Binaries (PDF) that speaks to concerns regarding the security of Industrial Control Systems (ICS) such as those attacked via Stuxnet. The paper outlines their ICSREF tool for reverse-engineering binaries from such systems and furthermore demonstrates a scenario whereby a commercial smartphone equipped with ICSREF could be easily used to compromise such infrastructure.
Lastly, It was announced that Vagrant Cascadian will present a talk at SeaGL in Seattle, Washington during November titled There and Back Again, Reproducibly.
2019 Summit
Registration for our fifth annual Reproducible Builds summit that will take place between 1st → 8th December in Marrakesh, Morocco has opened and personal invitations have been sent out.
Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. If you would like to learn more about the event and how to register, please visit our our dedicated event page.
Upstream news
Ben Hutchings added documentation to the Linux kernel regarding how to make the build reproducible. As he mentioned in the commit message, the kernel is “actually” reproducible but the end-to-end process was not previously documented in one place and thus Ben describes the workflow and environment needed to ensure a reproducible build.
Daniel Edgecumbe submitted a pull request which was subsequently merged to the logging/journaling component of systemd in order that the output of e.g. journalctl --update-catalog
does not differ between subsequent runs despite there being no changes in the input files.
Jelle van der Waa noticed that if the grafana monitoring tool was built within a source tree devoid of Git metadata then the current timestamp was used instead, leading to an unreproducible build. To avoid this, Jelle submitted a pull request in order that it use SOURCE_DATE_EPOCH
if available.
Mes (a Scheme-based compiler for our “sister” bootstrappable builds effort) announced their 0.20 release.
Distribution work
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. Thunderbird and kernel-vanilla
packages will be among the larger ones to become reproducible soon and there were additional Python patches to help reproducibility issues of modules written in this language that have C bindings.
OpenWrt is a Linux-based operating system targeting embedded devices such as wireless network routers. This month, Paul Spooren (aparcar) switched the toolchain the use the GCC version 8 by default in order to support the -ffile-prefix-map=
which permits a varying build path without affecting the binary result of the build […]. In addition, Paul updated the kernel-defaults
package to ensure that the SOURCE_DATE_EPOCH
environment variable is considered when creating the the /init
directory.
Alexander “lynxis” Couzens began working on a set of build scripts for creating firmware and operating system artifacts in the coreboot distribution.
Lukas Pühringer prepared an upload which was sponsored by Holger Levsen of python-securesystemslib
version 0.11.3-1 to Debian unstable. python-securesystemslib
is a dependency of in-toto, a framework to protect the integrity of software supply chains.
Arch Linux
The mkinitcpio
component of Arch Linux was updated by Daniel Edgecumbe in order that it generates reproducible initramfs images by default, meaning that two subsequent runs of mkinitcpio
produces two files that are identical at the binary level. The commit message elaborates on its methodology:
Timestamps within the initramfs are set to the Unix epoch of 1970-01-01. Note that in order for the build to be fully reproducible, the compressor specified (e.g. gzip, xz) must also produce reproducible archives. At the time of writing, as an inexhaustive example, the lzop compressor is incapable of producing reproducible archives due to the insertion of a runtime timestamp.
In addition, a bug was created to track progress on making the Arch Linux ISO images reproducible.
Debian
In July, Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that .buildinfo
metadata files were not being automatically propagated in the case that packages had to be manually approved in “NEW
queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a patch for the issue that was merged and deployed this month.
Aurélien Jarno filed a bug against the Debian Policy (#940234) to request a section be added regarding the reproducibility of source packages. Whilst there is already a section about reproducibility in the Policy, it only mentions binary packages. Aurélien suggest that it:
… might be a good idea to add a new requirement that repeatedly building the source package in the same environment produces identical
.dsc
files.
In addition, 51 reviews of Debian packages were added, 22 were updated and 47 were removed this month adding to our knowledge about identified issues. Many issue types were added by Chris Lamb including buildpath_in_code_generated_by_bison
, buildpath_in_postgres_opcodes
and ghc_captures_build_path_via_tempdir
.
Software development
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
blender
(Python date)buzztrax
(shell date)colobot-data
(sort a Pythonreaddir
, forwarded upstream)enblend-enfuse
(date/host/user)gmsh
(hostname/username)griefly
(sort a Pythonreaddir
, forwarded upstream)guile
(disable parallelism)latex2html
(drop LaTeX log file with date)MozillaFirefox
(make Profile-Guided Optimisation optional)MozillaThunderbird
(Python date)ninja
(build failure when build without parallelism)python-futures
(fix build failure)python-holidays
(fix build failure in 2020)python-iminuit
(sort a Python glob)python-ioflo
(fix build failure via security certificate renewal)python-keystoneauth1
(fix build failure in 2020)python-openstackdocstheme
(date issue)python3
/python
(toolchain, sortreaddir
)volk
(report compile-time CPU-detection)wget2
(drop a build date)
- Chris Lamb (“lamby”):
- #939546 filed against
libnbd
(forwarded upstream) - #939547 filed against
libubootenv
(forwarded upstream) - #939548 filed against
dsdp
. - #939549 filed against
sdaps
(forwarded upstream) - #939650 filed against
libvdpau
. - #940013 filed against
apophenia
. - #940156 filed against
pydantic
(forwarded upstream) - #940639 filed against
vala-panel
. - #941072 filed against
kivy
. - #941116 filed against
fathom
. - Several
libguestfs
components have received a patch to supportSOURCE_DATE_EPOCH
.
- #939546 filed against
- Rebecca N. Palmer:
- #941309 filed against node-browserify-lite.
Diffoscope
diffoscope
is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb uploaded versions 123
, 124
and 125
and made the following changes:
-
New features:
- Add
/srv/diffoscope/bin
to the Docker image path. (#70) - When skipping tests due to the lack of installed tool, print the package that might provide it. […]
- Update the “no progressbar” logging message to match the parallel
missing tlsh module
warnings. […] - Update “requires foo” messages to clarify that they are referring to Python modules. […]
- Add
-
Testsuite updates:
- The
test_libmix_differences
ELF binary test requires thexxd
tool. (#940645) - Build the OCaml test input files on-demand rather than shipping them with the package in order to prevent test failures with OCaml 4.08. (#67)
- Also conditionally skip the identification and “no differences” tests as we require the OCaml compiler to be present when building the test files themselves. (#940471)
- Rebuild our test squashfs images to exclude the character device as they requires root or fakeroot to extract. (#65)
- The
-
Many code cleanups, including dropping some unnecessary control flow […], dropping unnecessary
pass
statements […] and dropping explicitly inheriting fromobject
class as it unnecessary in Python 3 […].
In addition, Marc Herbert completely overhauled the handling of ELF binaries particularly around many assumptions that were previously being made via file extensions, etc. […][…][…] and updated the testsuite to support a newer version of the coreboot utilities. […]. Mattia Rizzolo then ensured that diffoscope does not crash when the progress bar module is missing but the functionality was requested […] and made our version checking code more lenient […]. Lastly, Vagrant Cascadian not only updated diffoscope to versions 123 and 125, he enabled a more complete test suite in the GNU Guix distribution. […][…][…][…][…][…]
Project website
There was yet more effort put into our our website this month, including:
- Chris Lamb:
- Holger Levsen:
- Add a link to our style guide on our “tools” page. […]
- Rework the handling of news/events, including adding a news archive page […] and differentiating between news and reports on the homepage […].
- Large number of changes to the “Who is Involved?” page, including adding a link to F-Droid’s verification server […] and their verification tool for end-users […] as well as adding the Civil Infrastructure Project (CIP) as a sponsor […]
- Include a link to our testing framework in all navigation elements. […]
- Add/improve a number of presentation entries on our Talks & Resources page. […][…][…][…][…]
In addition, Cindy Kim added in-toto to our “Who is Involved?” page, James Fenn updated our homepage to fix a number of spelling and grammar issues […] and Peter Conrad added BitShares to our list of projects interested in Reproducible Builds […].
strip-nondeterminism
strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Marc Herbert made a huge number of changes including:
- GNU ar handler:
- Don’t corrupt the pseudo file mode of the symbols table.
- Add test files for “symtab” (
/
) and long names (//
). - Don’t corrupt the SystemV/GNU table of long filenames.
- Add a new
$File::StripNondeterminism::verbose
global and, if enabled, tell the user thatar(1)
could not set the symbol table’s mtime.
In addition, Chris Lamb performed some issue investigation with the Debian Perl Team regarding issues in the Archive::Zip
module including a problem with corruption of members that use bzip
compression as well as a regression whereby various metadata fields were not being updated that was reported in/around Debian bug #940973.
Test framework
We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org.
- Alexander “lynxis” Couzens:
- Holger Levsen:
- Correctly handle the
$DEBUG
variable in OpenWrt builds. […] - Fefactor and notify the
#archlinux-reproducible
IRC channel for problems in this distribution. […] - Ensure that only one mail is sent when rebooting nodes. […]
- Unclutter the output of a Debian maintenance job. […]
- Drop a “todo” entry as we vary on a merged
/usr
for some time now. […]
- Correctly handle the
In addition, Paul Spooren added an OpenWrt snapshot build script which downloads .buildinfo
and related checksums from the relevant download server and attempts to rebuild and then validate them for reproducibility. […]
The usual node maintenance was performed by Holger Levsen […][…][…], Mattia Rizzolo […] and Vagrant Cascadian […][…].
reprotest
reprotest
is our end-user tool to build same source code twice in different environments and then check the binaries produced by each build for differences. This month, a change by Dmitry Shachnev was merged to not use the faketime
wrapper at all when asked to not vary time […] and Holger Levsen subsequently released this as version 0.7.9
as dramatically overhauling the packaging […][…].
Misc news & getting in touch
On our mailing list Rebecca N. Palmer started a thread titled Addresses in IPython output which points out and attempts to find a solution to a problem with Python packages, whereby objects that don’t have an explicit string representation have a default one that includes their memory address. This causes problems with reproducible builds if/when such output appears in generated documentation.
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.