Welcome to the May 2020 report from the Reproducible Builds project.
One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. Nonetheless, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.
In these reports we outline the most important things that we and the rest of the community have been up to over the past month.
News
The Corona-Warn app that helps trace infection chains of SARS-CoV-2/COVID-19 in Germany had a feature request filed against it that it build reproducibly.
A number of academics from Cornell University have published a paper titled Backstabber’s Knife Collection which reviews various open source software supply chain attacks:
Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle.
In related news, the LineageOS Android distribution announced that a hacker had access to the infrastructure of their servers after exploiting an unpatched vulnerability.
Marcin Jachymiak of the Sia decentralised cloud storage platform posted on their blog that their siac and siad utilities can now be built reproducibly:
This means that anyone can recreate the same binaries produced from our official release process. Now anyone can verify that the release binaries were created using the source code we say they were created from. No single person or computer needs to be trusted when producing the binaries now, which greatly reduces the attack surface for Sia users.
Synchronicity is a distributed build system for Rust build artifacts which have been published to crates.io. The goal of Synchronicity is to provide a distributed binary transparency system which is independent of any central operator.
The Comparison of Linux distributions article on Wikipedia now features a Reproducible Builds column indicating whether distributions approach and progress towards achieving reproducible builds.
Distribution work
In Debian this month:
-
Paul Wise continued a discussion that was started in February regarding the storing and distribution of build logs and other related artifacts and their relationship to reproducible builds. For example, the
binutilspackage ships its own, unreproducible, log files in its binary packages. It was followed-up by replies from Chris Lamb and Matthias Klose. -
34 reviews of Debian packages were added, 20 were updated and 122 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised a new
ocaml_cmti_filestoolchain issue.
In Alpine Linux, an issue was filed — and closed — regarding the reproducibility of .apk packages.
Allan McRae of the ArchLinux project posted their third Reproducible builds progress report to the arch-dev-public mailing list which includes the following call for help:
We also need help to investigate and fix the packages that fail to reproduce that we have not investigated as of yet.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.
Software development
diffoscope
Chris Lamb made the changes listed below to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. He also prepared and uploaded versions 142, 143, 144, 145 and 146 to Debian, PyPI, etc.
-
Comparison improvements:
- Improve fuzzy matching of JSON files as
filenow supports recognising JSON data. (#106) - Refactor
.changesand.buildinfohandling to show all details (including the GnuPG header and footer components) even when referenced files are not present. (#122) - Use our
BuildinfoFilecomparator (etc.) regardless of whether the associated files (such as theorig.tar.gzand the.deb) are present. […] - Include GnuPG signature data when comparing
.buildinfo,.changes, etc. […] - Add support for printing Android APK signatures via
apksigner(1). (#121) - Identify “iOS App Zip archive data” as
.zipfiles. (#116) - Add support for Apple Xcode
.mobilepovisionfiles. (#113)
- Improve fuzzy matching of JSON files as
-
Bug fixes:
-
Output improvements:
- Never emit the same
id="foo"anchor reference twice in the HTML output, otherwise identically-named parts will not be able to linked to via a#fooanchor. (#120) - Never emit an empty “id” anchor either; it is not possible to link to
#. […] - Don’t pretty-print the output when using the
--jsonpresenter; it will usually be too complicated to be readable by the human anyway. […] - Use the SHA256 over MD5 hash when generating page names for the HTML directory-style presenter. (#124)
- Never emit the same
-
Reporting improvements:
- Clarify the message when we truncate the number of lines to standard error […] and reduce the number of maximum lines printed to 25 as usually the error is obvious by then […].
- Print the amount of free space that we have available in our temporary directory as a debugging message. […]
- Clarify
Command […] failed with exit codemessages to remove duplicateexited with exitbut also to note thatdiffoscopeis interpreting this as an error. […] - Don’t leak the full path of the temporary directory in
Command […] exited with 1messages. (#126) - Clarify the warning message when we cannot import the
debianPython module. […] - Don’t repeat
stderr from {}if both commands emit the same output. […] - Clarify that an external command emits for both files, otherwise it can look like we are repeating itself when, in reality, it is being run twice. […]
-
Testsuite improvements:
-
Dockerfileimprovements:- Add a
.dockerignorefile to whitelist files we actually need in our container. (#105) - Use
ARGinstead ofENVwhen setting up theDEBIAN_FRONTENDenvironment variable at runtime. (#103) - Run as a non-root user in container. (#102)
- Install/remove the
build-essentialduring build so we can install the recommended packages from Git. […]
- Add a
-
Codebase improvements:
- Bump the officially required version of Python from 3.5 to 3.6. (#117)
- Drop the (default)
shell=Falsekeyword argument tosubprocess.Popenso that the potentially-unsafeshell=Trueis more obvious. […] - Perform string normalisation in Black […] and include the Black output in the assertion failure too […].
- Inline
MissingFile’s special handling ofdeb822to prevent leaking through abstract layers. […][…] - Allow a bare
try/exceptblock when cleaning up temporary files with respect to theflake8quality assurance tool. […] - Rename
in_dsc_pathtodsc_in_same_dirto clarify the use of this variable. […] - Abstract out the duplicated parts of the
debian_fallbackclass […] and add descriptions for the file types. […] - Various commenting and internal documentation improvements. […][…]
- Rename the
Opensslcommand class toOpenSSLPKCS7to accommodate other command names with this prefix. […]
-
Misc:
- Rename the
--debuggercommand-line argument to--pdb. […] - Normalise filesystem
stat(2)“birth times” (ie.st_birthtime) in the same way we do with thestat(1)command’sAccess:andChange:times to fix a nondeterministic build failure in GNU Guix. (#74) - Ignore case when ordering our file format descriptions. […]
- Drop, add and tidy various module imports. […][…][…][…]
- Rename the
In addition:
-
Jean-Romain Garnier fixed a general issue where, for example,
LibarchiveMember’shas_same_contentmethod was called regardless of the underlying type of file. […] -
Daniel Fullmer fixed an issue where some filesystems could only be mounted read-only. (!49)
-
Emanuel Bronshtein provided a patch to prevent a build of the Docker image containing parts of the build’s. (#123)
-
Mattia Rizzolo added an entry to
debian/py3dist-overridesto ensure therpm-pythonmodule is used in package dependencies (#89) and moved to using the newexecute_after_*andexecute_before_*Debhelper rules […].
Chris Lamb also performed a huge overhaul of diffoscope’s website:
- Add a completely new design. […][…]
- Dynamically generate our contributor list […] and supported file formats […] from the main Git repository.
- Add a separate, canonical page for every new release. […][…][…]
- Generate a ‘latest release’ section and display that with the corresponding date on the homepage. […]
- Add an RSS feed of our releases […][…][…][…][…] and add to Planet Debian […].
- Use Jekyll’s
absolute_urlandrelative_urlwhere possible […][…] and move a number of configuration variables to_config.yml[…][…].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
golang-packaging(toolchain issue, affecting times inminikube)jboss-logging-tools(toolchain issue, affecting date forresteasy)linux_logo(sortfindoutput to avoid inheriting filesystem order)moonjit(generate reproducible output by default ifSOURCE_DATE_EPOCHis set)vala(report ASLR nondeterminism)
-
Jelle van der Waa:
earlyoom(timestamps in Gzip files)fmt(Don’t installsphinx-buildcached files as they are unneeded & unreproducible)nvidia-settings(timestamp in Gzip files)
-
Chris Lamb:
- #959714 filed against
ataqv. - #960313 filed against
elinks. - #960386 filed against
briquolo. - #960388 filed against
cryptominisat. - #960590 filed against
wolfssl. - #960591 filed against
mistral. - #960607 filed against
python-watcherclient. - #960669 filed against
tree-puzzle. - #961009 filed against
nulib2. - #961202 filed against
process-cpp. - #961494 filed against
bowtie2. - #961495 filed against
properties-cpp. - #961582 filed against
wand(forwarded upstream) - #961657 filed against
vows.
- #959714 filed against
-
Vagrant Cascadian:
- #961747 filed against
libstatgrab. - #961764 filed against
texi2html. - #961766 filed against
grub. - #961830 filed against
systemtap. - #961942 filed against
mono. mescc-tools: InheritCFLAGSin aMakefile, allowing-ffile-prefix-map/-fdebug-prefix-mapto sanitise build paths (merged upstream).
- #961747 filed against
Other tools
Elsewhere in our tooling:
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In May, Chris Lamb uploaded version 1.8.1-1 to Debian unstable and Bernhard M. Wiedemann fixed an “off-by-one” error when parsing PNG image modification times. (#16)
In disorderfs, our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues, Chris Lamb replaced the term “dirents” in place of “directory entries” in human-readable output/log messages […] and used the astyle source code formatter with the default settings to the main disorderfs.cpp source file […].
Holger Levsen bumped the debhelper-compat level to 13 in disorderfs […] and reprotest […], and for the GNU Guix distribution Vagrant Cascadian updated the versions of disorderfs to version 0.5.10 […] and diffoscope to version 145 […].
Project documentation & website
-
Carl Dong:
- Clarify some potential confusion around GCC
libtool. […]
- Clarify some potential confusion around GCC
-
Chris Lamb:
- Rename the Who page to Projects”. […]
- Ensure that Jekyll enters the
_docssubdirectory to find the_docs/index.mdfile after an internal move. (#27) - Wrap
ltmain.shetc. in preformatted quotes. […] - Wrap the
SOURCE_DATE_EPOCHPython examples onto more lines to prevent visual overflow on the page. […] - Correct a “preferred” spelling error. […]
-
Holger Levsen:
- Sort our Academic publications page by publication year […] and add “Trusting Trust” and “Fully Countering Trusting Trust through Diverse Double-Compiling” […].
-
Juri Dispan:
- Update the URL for
faketimeto the project’s Github page. (!57)
- Update the URL for
Testing framework
We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced. Holger Levsen made the following changes:
-
System health status:
-
- Fail loudly if there are more than three
.buildinfofiles with the same name. […] - Fix a typo which prevented
/usrmerge variation on Debian unstable. […] - Temporarily ignore PHP’s horde](https://www.horde.org/) packages in Debian bullseye. […]
- Document how to reboot all nodes in parallel, working around
molly-guard. […]
- Fail loudly if there are more than three
-
Further work on a Debian package rebuilder:
- Workaround and document various issues in the
debrebuildscript. […][…][…][…] - Improve output in the case of errors. […][…][…][…]
- Improve documentation and future goals […][…][…][…], in particular documentiing two real world tests case for an “impossible to recreate build environment” […].
- Find the right source package to rebuild. […]
- Increase the frequency we run the script. […][…][…][…]
- Improve downloading and selection of the sources to build. […][…][…]
- Improve version string handling.. […]
- Handle build failures better. […]. […]. […]
- Also consider “architecture all”
.buildinfofiles. […][…]
- Workaround and document various issues in the
In addition:
-
kpcyrd, for Alpine Linux, updated the
alpine_schroot.shscript now that a patch forabuildhad been released upstream. […] -
Alexander Couzens of the OpenWrt project renamed the
brcm47xxtarget tobcm47xx. […] -
Mattia Rizzolo fixed the printing of the build environment during the second build […][…][…] and made a number of improvements to the script that deploys Jenkins across our infrastructure […][…][…].
Lastly, Vagrant Cascadian clarified in the documentation that you need to be user jenkins to run the blacklist command […] and the usual build node maintenance was performed by Holger Levsen […][…][…], Mattia Rizzolo […][…] and Vagrant Cascadian […][…][…].
Mailing list:
There were a number of discussions on our mailing list this month:
Paul Spooren started a thread titled Reproducible Builds Verification Format which reopens the discussion around a schema for sharing the results from distributed rebuilders:
To make the results accessible, storable and create tools around them, they should all follow the same schema, a reproducible builds verification format. The format tries to be as generic as possible to cover all open source projects offering precompiled source code. It stores the rebuilder results of what is reproducible and what not.
Hans-Christoph Steiner of the Guardian Project also continued his previous discussion regarding making our website translatable.
Lastly, Leo Wandersleb posted a detailed request for feedback on a question of supply chain security and other issues of software review; Leo is the founder of the Wallet Scrutiny project which aims to prove the security of Android Bitcoin Wallets:
Do you own your Bitcoins or do you trust that your app allows you to use “your” coins while they are actually controlled by “them”? Do you have a backup? Do “they” have a copy they didn’t tell you about? Did anybody check the wallet for deliberate backdoors or vulnerabilities? Could anybody check the wallet for those?
Elsewhere, Leo had posted instructions on his attempts to reproduce the binaries for the BlueWallet Bitcoin wallet for iOS and Android platforms.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-buildsonirc.oftc.net. -
Twitter: @ReproBuilds
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.







