Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.
What are reproducible builds?
One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security.
But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.
News
The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article)
Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube:
The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build.
The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info)
It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software.
Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).
Distribution work
In the ArchLinux distribution, a goal to remove .doctrees
from installed files was created via Arch’s ‘TODO list’ mechanism. These .doctree
files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default.
Dimitry Andric was able to determine why the reproducibility status of FreeBSD’s base.txz
depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler […]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again […].
For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan “janneke” Nieuwenhuizen posted that they had further reduced their “bootstrap seed” to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:
autogen
(Date issue)carla
(Timestamp in Windows Portable Executable executables)fonttosfnt/xorg-x11-fonts
(Address space layout randomization issue)fossil
(Date issue)gcc10 C++
(Link-time optimisation issue)grep
(Profile-guided optimisation issue)kubernetes1.18
(Remove Go build identifier)libjcat
(Remove certificate)lifelines
(Date issue)miredo
(Drop hostname)stressapptest
(Override date, user & host)
Debian
Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check
tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb […][…][…].
Timo Röhling filed a wishlist bug against the debhelper
build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. […]
Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system’s approach this […]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds.
47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates
issue […] and updated the paths_vary_due_to_usrmerge as deterministic
issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path
and gcc_captures_build_path
issues. […][…][…].
Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map
command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild
Debian package rebuilder tool (#961861, #961862 & #961864).
Development
On our website this month, Arnout Engelen added a link to our Mastodon account […] and moved the SOURCE_DATE_EPOCH
git log
example to another section […]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 […].
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper
compatibility level to 13 […] and adjusted a related dependency to avoid potential circular dependency […].
Upstream work
The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including:
-
Andreas Schleifer:
-
Bernhard M. Wiedemann:
autogen
(race condition)cockpit
(date)fossil
(date)libnvidia-container
(date)libv3270
( date)
-
Chris Lamb:
- #962401 filed against
netcdf-fortran
. - #962589 filed against
seqtools
. - #962702 filed against
python-pauvre
. - #963119 filed against
petitboot
. - #963120 filed against
fonts-anonymous-pro
. - #963124 filed against
python-pyqtgraph
(forwarded upstream) - #963485 filed against
libqmi
. - #963486 filed against
tkabber-plugins
. - #963533 filed against
python-stem
. - #963537 filed against
golang-v2ray-core
. - #963600 filed against
critcl
. - #963602 filed against
gftl
. - #963603 filed against
libmbim
. - #963688 filed against
neovim-qt
. - #963740 filed against
golang-github-viant-toolbox
.
- #962401 filed against
-
Hendrik Meyer:
-
Nick Wellnhofer:
libxml2
(random data corruption)
-
Jelle van der Waa:
-
Eli Schwartz:
-
Vagrant Cascadian:
Bernhard M. Wiedemann also filed reports for frr
(build fails on single-processor machines), ghc-yesod-static/git-annex
(a filesystem ordering issue) and ooRexx
(ASLR-related issue).
diffoscope
diffoscope is our in-depth ‘diff-on-steroids’ utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs.
This month, Chris Lamb uploaded versions 147
, 148
and 149
to Debian and made the following changes:
-
New features:
-
Bug fixes:
- Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF
/Info
stanza). (#150) - Fix compatibility with
jsondiff
version 1.2.0. (#159) - Fix an issue in GnuPG keybox file handling that left filenames in the diff. […]
- Correct detection of JSON files due to missing call to
File.recognizes
that checks candidates againstfile(1)
. […]
- Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF
-
Output improvements:
-
Logging improvements:
- Log calls to
subprocess.check_output
by using a wrapper. (#151) - Clarify that we are generating presenter formats in a debug-level message. […]
- Log the version of jsondiff used. […]
- Log calls to
-
Testsuite improvements:
-
Codebase improvements:
- Replace obscure references to WF with “Wagner-Fischer” for clarity. […]
- Use a semantic
AbstractMissingType
type instead of remembering to check for both types of ‘missing’ files. […] - Add a comment regarding potential security issue in the
.changes
,.dsc
and.buildinfo
comparators. […] - Drop a large number of unused imports. […][…][…][…][…]
- Make many code sections more Pythonic. […][…][…][…]
- Prevent some variable aliasing issues. […][…][…]
- Use some tactical
f-strings
to tidy up code […][…] and remove explicitu"unicode"
strings […]. - Refactor a large number of routines for clarity. […][…][…][…]
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12)
In addition Jean-Romain Garnier made the following changes:
- Fix the
--new-file
option when comparing directories by mergingDirectoryContainer.compare
andContainer.compare
. (#180) - Allow user to mask/filter diff output via
--diff-mask=REGEX
. (!51) - Make child pages open in new window in the
--html-dir
presenter format. […] - Improve the diffs in the
--html-dir
format. […][…]
Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator […] and Mattia Rizzolo prevented warnings from the tlsh
fuzzy-matching library during tests […] and tweaked the build system to remove an unwanted .build
directory […]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 […] and later 148 […].
Testing framework
We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org
. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
-
Debian-related changes:
- Prevent bogus failure emails from
rsync2buildinfos.debian.net
every night. […] - Merge a fix from David Bremner’s database of
.buildinfo
files to include a fix regarding comparing source vs. binary package versions. […] - Only run the Debian package rebuilder job twice per day. […]
- Increase bullseye scheduling. […]
- Prevent bogus failure emails from
-
System health status page:
- Add a note displaying whether a node needs to be rebooted for a kernel upgrade. […]
- Fix sorting order of failed jobs. […]
- Expand footer to link to the related Jenkins job. […]
- Add
archlinux_html_pages
,openwrt_rebuilder_today
andopenwrt_rebuilder_future
to ‘known broken’ jobs. […] - Add HTML
<meta>
header to refresh the page every 5 minutes. […] - Count the number of ignored jobs […], ignore permanently ‘known broken’ jobs […] and jobs on ‘known offline’ nodes […].
- Only consider the ‘known offline’ status from Git. […]
- Various output improvements. […][…]
-
Tools:
- Switch URLs for the Grml Live Linux and PureOS package sets. […][…]
- Don’t try to build a disorderfs Debian source package. […][…][…]
- Stop building diffoscope as we are moving this to Salsa. […][…]
- Merge several “is diffoscope up-to-date on every platform?” test jobs into one […] and fail less noisily if the version in Debian cannot be determined […].
In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components […], Jelle van der Waa updated the “is diffoscope up-to-date in every platform” check for Arch Linux and diffoscope […], Mattia Rizzolo backed up a copy of a “remove script” run on the Codethink-hosted ‘jump server‘ […] and Vagrant Cascadian temporarily disabled the fixfilepath
on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map
categorised issue.
Lastly, the usual build node maintenance was performed by Holger Levsen […][…], Mattia Rizzolo […] and Vagrant Cascadian […][…][…][…][…].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mastodon: @reproducible_builds@fosstodon.org
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.