Welcome to the July 2019 report from the Reproducible Builds project!
In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.
The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In July’s report, we cover:
- Front page — Media coverage, upstream news, etc.
- Distribution work — Shenanigans at DebConf19
- Software development — Software transparency, yet more diffoscope work, etc.
- On our mailing list — GNU tools, education and buildinfo files
- Getting in touch — … and how to contribute
If you are interested in contributing to our project, we enthusiastically invite you to visit our Contribute page on our website.
Front page
Nico Alt wrote a detailed and well-researched article titled “Trust is good, control is better” which discusses Reproducible builds in F-Droid the alternative application repository for Android mobile phones. In contrast to the bigger commercial app stores F-Droid only offers apps that are free and open source software. The post not only demonstrates using diffoscope but talks more generally about how reproducible builds can prevent single developers or other important centralised infrastructure becoming targets for toolchain-based attacks.
Later in the month, F-Droid’s aforementioned reproducibility status was mentioned on episode 68 of the Late Night Linux podcast. (direct link to 14:12)
Morten (“Foxboron”) Linderud published his academic thesis “Reproducible Builds: break a log, good things come in trees” which investigates and describes how transparency log overlays can provide additional security guarantees for computers automatically producing software packages. The thesis was part of Morten’s studies at the University of Bergen, Norway and is an extension of the work New York University Tandon School of Engineering has been doing with package rebuilder integration in APT.
Mike Hommey posted to his blog about Reproducing the Linux builds of Firefox 68 which leverages that builds shipped by Mozilla should be reproducible from this version. He discusses the problems caused by the builds being optimised with Profile-Guided Optimisation (PGO) but armed with the now-published profiling data, Mike provides Docker-based instructions how to reproduce the published builds yourself.
Joel Galenson has been making progress on a reproducible Rust compiler which includes support for a --remap-path-prefix
argument, related to the concepts and problems involved in the BUILD_PATH_PREFIX_MAP
proposal to fix issues with build paths being embedded in binaries.
Lastly, Alessio Treglia posted to their blog about Cosmos Hub and Reproducible Builds which describes the reproducibility work happening in the Cosmos Hub, a network of interconnected blockchains. Specifically, Alessio talks about work being done on the Gaia development kit for the Hub.
Distribution work
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution where enabling. Enabling Link Time Optimization (LTO) in this distribution’s “Tumbleweed” branch caused multiple issues due to the number of cores on the build host being added to the CFLAGS
variable. This affected, for example, a debuginfo/rpm
header as well as resulted in in CFLAGS
appearing in built binaries such as fldigi
, gmp
, haproxy
, etc.
As highlighted in last month’s report, the OpenWrt project (a Linux operating system targeting embedded devices such as wireless network routers) hosted a summit in Hamburg, Germany. Their full summit report and roundup is now available that covers many general aspects within that distribution, including the work on reproducible builds that was done during the event.
Debian
It was an extremely productive time in Debian this month in and around DebConf19, the 20th annual conference for both contributors and users and was held at the Federal University of Technology in Paraná (UTFPR) in Curitiba, Brazil, from July 21 to 28. The conference was preceded by “DebCamp” from the 14th until the 19th with an additional “Open Day” that is targeted at the more-general public on the 20th.
There were a number of talks touching on the topic of reproducible builds and secure toolchains throughout the conference, including:
- Reproducible Builds - aiming for bullseye by Holger Levsen, Chris Lamb and Vagrant Cascadian.
- Software transparency: improving package manager security presented by Benjamin Hof.
- Software transparency BoF, an informal “birds of a feather” session to discuss and collect ideas around detecting compromised archives.
There were naturally countless discussions regarding Reproducible Builds in and around the conference on the questions of tooling, infrastructure and our next steps as a project.
The release of Debian 10 buster has also meant the release cycle for the next release (codenamed “bullseye”) has just begun. As part of this, the Release Team recently announced that Debian will no longer allow binaries built and uploaded by maintainers on their own machines to be part of the upcoming release. This is great news not only for toolchain security in general but also in that it will ensure that all binaries that will form part of this release will likely have a .buildinfo
file and thus metadata that could be used by others to reproduce and verify the builds.
Holger Levsen filed a bug against the underlying tool that maintains the Debian archive (“dak”) after he noticed that .buildinfo
metadata files were not being automatically propagated if packages had to be manually approved or processed in the so-called “NEW
queue”. After it was pointed out that the files were being retained in a separate location, Benjamin Hof proposed a potential patch for the issue which is pending review.
David Bremner posted to his blog post about “Yet another buildinfo database” that provides a SQL interface for querying .buildinfo
attestation documents, particularly focusing on identifying packages that were built with a specific — and possibly buggy — build-dependency. Later at DebConf, David demonstrated his tool live (starting at 36:30).
Ivo de Decker (“ivodd”) scheduled rebuilds of over 600 packages that last experienced an upload to the archive in December 2016 or earlier. This was so that they would be built using a version of the low-level dpkg
package build tool that supports the generation of reproducible binary packages. The effect of this on the main archive will be deliberately staggered and thus visible throughout the upcoming weeks, potentially resulting in some of these packages now failing to build.
Joaquin de Andres posted an update regarding the work being done on continuous integration on Debian’s GitLab instance at DebConf19 in which he mentions, inter alia, a tool called atomic-reprotest
. This is a relatively new utility to help debug failures logged by our reprotest
tool which attempts to test whether a build is reproducible or not. This tool was also mentioned in a subsequent lightning talk.
Chris Lamb filed two bugs to drop the test jobs for both strip-nondeterminism
(#932366) and reprotest
(#932374) after modifying them to build on the Salsa server’s own continuous integration platform and Holger Levsen shortly resolved them.
Lastly, 63 reviews of Debian packages were added, 72 were updated and 22 were removed this month, adding to our large knowledge about identified issues. Chris Lamb added and categorised four new issue types, umask_in_java_jar_file
, built_by-in_java_manifest_mf
, timestamps_in_manpages_generated_by_lopsubgen
and codadef_coda_data_files
.
Software development
The goal of Benjamin Hof’s Software Transparency effort is to improve on the cryptographic signatures of the APT package manager by introducing a Merkle tree-based transparency log for package metadata and source code, in a similar vein to certificate transparency. This month, he pushed a number of repositories to our revision control system for further future development and review.
In addition, Bernhard M. Wiedemann updated his (deliberately) unreproducible demonstration project to add support for floating point variations as well as changes in the project’s copyright year.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
HSAIL-Tools
— Sort hash, submitted upstream.MozillaFirefox
— Date.MozillaFirefox
,MozillaThunderbird — Fix a race condition.blockattack
— Sortzip
shell call andgzip(1)
’s-n
.boswars
— Sort Pythonreaddir
, submitted upstream.bubbros
— Strip.png
date & time, filed upstream.calibre
— Sort Pythonglob
.colobot-data
— Sort Pythonreaddir
.cri-o
— Date.duckmarines
—zip -X
.filtlan
— Fix LaTeX run, dropping an unreproducible log with date.frogatto
— Strip.png
date & time.gcc
— Report Link-time optimisation-induced nondeterminism caused by using global constructors.griefly
— Sort Pythonreaddir
.herbstluftwm
— Use CMakeTIMESTAMP
variable.kitty
— Sort filesystem.lbreakout2
— Strip.png
date & time.lsp-plugins
— Usescandir
instead ofreaddir
system call.maven-javadoc-plugin
— Report copyright using the current year.metamail/mimegrep
— Date.mono
— Report unreproducible.dll
version.netpanzer
— Sort SConsglob
/readdir
.notpacman
— Strip.png
date & time.obs-build
— Usegzip -n
in Debian package build.open-iscsi
— Fix nondeterministic%ghost
file size.perl-File-Unpack
— Fix a parallelism-induced race condition, submitted upstream.python-futurist
— Drop Pythonenvironment.pickle
.python-geolib
— Dropenvironment.pickle
.python-nautilus
— Python date, already filed upstream.python-pyreadstat
— Sort Python readdir.python-scikit-image
— Drop randomness from package.python-slycot
— Drop unreproducible.pyc
files.python-statsmodels
— Drop unreproducible.pyc
files.sienna
—zip -X
.springrts
— Usestrip-nondeterminism
on.zip
modification times.trilinos
— Sort a Perlreaddir
call /File::Find
.vdrift
— Fix a date exposed by in SCons and Python.wordwarvi
— Adjust a date, already filed upstream.worldofpadman
— Fix a date, merged upstream.yadex
— Fix file modification times.znc
— Avoid parallelism race fromCMakeFile
s.
- Chris Lamb:
- #931706 filed against
node-d3-selection
. - #931854 filed against
liblopsub
. - #932116 filed against
snakemake
. - #932117 filed against
ninja-build
. - #932300 filed against
sysvinit
. - #932301 filed against
python-os-faults
. - #932302 filed against
calendar
. - #932365 filed against
python-manilaclient
.
- #931706 filed against
Neal Gompa, Michael Schröder & Miro Hrončok responded to Fedora’s recent change to rpm-config
with some new developments within rpm to fix an unreproducible “Build Date
” and reverted a change to the Python interpreter to switch back to unreproducible/time-based compile caches.
Lastly, kpcyrd submitted a pull request for Alpine Linux to add SOURCE_DATE_EPOCH
support to the abuild
build tool in this operating system.
diffoscope
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb made the following changes:
- Add support for Java
.jmod
modules (#60). However, not all versions offile(1)
support detection of these files yet, so we perform a manual comparison instead […]. - If a command fails to execute but does not print anything to standard error, try and include the first line of standard output in the message we include in the difference. This was motivated by
readelf(1)
returning its error messages on standard output. [#59) […] - Add general support for
file(1)
5.37 (#57) but also adjust the code to not fail in tests when, eg, we do not have sufficiently newer or older version offile(1)
(#931881). - Factor out the ability to ignore the exit codes of
zipinfo
andzipinfo -v
in the presence of non-standard headers. […] but only override the exit code from our special-cased calls tozipinfo(1)
if they are1
or2
to avoid potentially masking real errors […]. - Cease ignoring test failures in
stable-backports
. […] - Add missing textual
DESCRIPTION
headers for.zip
and “Mozilla”-optimised.zip
files. […] - Merge two overlapping environment variables into a single
DIFFOSCOPE_FAIL_TESTS_ON_MISSING_TOOLS
. […] - Update some reporting:
- Add some explicit return values to appease Pylint, etc. […]
- Also include the
python3-tlsh
in the Debian test dependencies. […] - Released and uploaded releasing versions 116, 117, 118, 119 & 120. […][…][…][…][…]
In addition, Marc Herbert provided a patch to catch failures to disassemble ELF binaries. […]
Project website
There was a yet more effort put into our our website this month, including:
- Bernhard M. Wiedemann:
- Update multiple works to use standard (or at least consistent) terminology. […]
- Document an alternative Python snippet in the
SOURCE_DATE_EPOCH
examples examples. […]
- Chris Lamb:
- Split out our non-fiscal sponsors with a description […] and make them non-display three-in-a-row […].
- Correct references to 1&1 IONOS (née Profitbricks). […]
- Reduce ambiguity in our environment names. […]
- Recreate the badge image, saving the
.svg
alongside it. […] - Update our fiscal sponsors. […][…][…]
- Tidy the weekly reports section on the news page […], fixup the typography on the documentation page […] and make all headlines stand out a bit more […].
- Drop some old CSS files and fonts. […]
- Tidy news page a bit. […]
- Fixup a number of issues in the report template and previous reports. […][…][…][…][…][…]
Holger Levsen also added explanations on how to install diffoscope on OpenBSD […] and FreeBSD […] to its homepage and Arnout Engelen added a preliminary and work-in-progress idea for a badge or “shield” program for upstream projects. […][…][…].
A special thank you to Alexander Borkowski […] Georg Faerber […], and John Scott […] for their individual fixes. To err is human; to reproduce, divine.
strip-nondeterminism
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Niko Tyni provided a patch to use the Perl Sub::Override
library for some temporary workarounds for issues in Archive::Zip
instead of Monkey::Patch
which was due for deprecation. […].
In addition, Chris Lamb made the following changes:
- Identify data files from the COmmon Data Access (CODA) framework as being
.zip
files. […] - Support OpenJDK “.jmod” files. […]
- Pass
--no-sandbox
if necessary to bypass seccomp-enabled version offile(1)
which was causing a huge number of regressions in our testing framework. - Don’t just run the tests but build the Debian package instead using Salsa’s centralised scripts so that we get code coverage, Lintian, autopkgtests, etc. […][…]
- Update tests:
- Drop misleading and outdated
MANIFEST
andMANIFEST.SKIP
files as they are not used by our release process. […]
Test framework
We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. The following changes were performed in the last month:
- Holger Levsen:
- Debian-specific changes:
- Make a large number of adjustments to support the new Debian bullseye distribution and the release of buster. […][…][…][…][…][…][…] […][…][…][…]
- Fix the colours for the five suites now being built. […]
- Make a number code improvements to the calculation of our “metapackage” sets including refactoring and changes of email address, etc. […][…][…][…][…]
- Add the “
http-proxy
” variable to the displayed node info. […]
- Alpine changes:
- Rebuild the webpages every two hours (instead of twice per hour). […]
- Reproducible tooling:
- Debian-specific changes:
- Mattia Rizzolo:
Holger Levsen […][…][…] and Mattia Rizzolo […][…][…] performed the usual node maintenance and lastly, Vagrant Cascadian added support to generate a reproducible-tracker.json
metadata file for the next release of Debian (bullseye). […]
On the mailing list
Chris Lamb cross-posted his reply to the “Re: file(1) now with seccomp support enabled thread that was originally started on the debian-devel
Debian list. In his post, he refers to a strip-nondeterminism not being able to accommodate the additional security hardening in file(1)
and the changes made to the tool in order to do fix this issue which was causing a huge number of regressions in our testing framework.
Matt Bearup wrote about his experience when he generated different checksums for the libgcrypt20
package which resulted in some pointers etc. in that one should be using the equivalent .buildinfo
post-build certificate when attempting to reproduce any particular build.
Vagrant Cascadian posted a request for comments regarding a potential proposal to the GNU Tools “Cauldron” gathering to be held in Montréal, Canada during September 2019 and Bernhard M. Wiedemann posed a query about using consistent terms on our webpages and elsewhere.
Lastly, in a thread titled “Reproducible Builds - aiming for bullseye: comments and a purpose” Jathan asked about whether we had considered offering “101”-like beginner sessions to fix packages that are not currently reproducible.
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Benjamin Hof, Bernhard M. Wiedemann, Chris Lamb, Holger Levsen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.