Reproducible Builds in June 2020

View all our monthly reports


Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds?

One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security.

But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News

The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article)

Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube:

The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build.

The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info)

It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software.

Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).


Distribution work

In the ArchLinux distribution, a goal to remove .doctrees from installed files was created via Arch’s ‘TODO list’ mechanism. These .doctree files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default.

Dimitry Andric was able to determine why the reproducibility status of FreeBSD’s base.txz depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler []. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [].

For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan “janneke” Nieuwenhuizen posted that they had further reduced their “bootstrap seed” to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors.

In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:

Debian

Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [][][].

Timo Röhling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. []

Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system’s approach this []. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds.

47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates issue [] and updated the paths_vary_due_to_usrmerge as deterministic issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path and gcc_captures_build_path issues. [][][].

Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild Debian package rebuilder tool (#961861, #961862 & #961864).

Development

On our website this month, Arnout Engelen added a link to our Mastodon account [] and moved the SOURCE_DATE_EPOCH git log example to another section []. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [].

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper compatibility level to 13 [] and adjusted a related dependency to avoid potential circular dependency [].

Upstream work

The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including:

Bernhard M. Wiedemann also filed reports for frr (build fails on single-processor machines), ghc-yesod-static/git-annex (a filesystem ordering issue) and ooRexx (ASLR-related issue).

diffoscope

diffoscope is our in-depth ‘diff-on-steroids’ utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs.

This month, Chris Lamb uploaded versions 147, 148 and 149 to Debian and made the following changes:

  • New features:

    • Add output from strings(1) to ELF binaries. (#148)
    • Dump PE32+ executables (such as EFI applications) using objdump(1). (#181)
    • Add support for Zsh shell completion. (#158)
  • Bug fixes:

    • Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF /Info stanza). (#150)
    • Fix compatibility with jsondiff version 1.2.0. (#159)
    • Fix an issue in GnuPG keybox file handling that left filenames in the diff. []
    • Correct detection of JSON files due to missing call to File.recognizes that checks candidates against file(1). []
  • Output improvements:

    • Use the CSS word-break property over manually adding U+200B zero-width spaces as these were making copy-pasting cumbersome. (!53)
    • Downgrade the tlsh warning message to an ‘info’ level warning. (#29)
  • Logging improvements:

  • Testsuite improvements:

    • Update tests for file(1) version 5.39. (#179)
    • Drop accidentally-duplicated copy of the --diff-mask tests. []
    • Don’t mask an existing test. []
  • Codebase improvements:

    • Replace obscure references to WF with “Wagner-Fischer” for clarity. []
    • Use a semantic AbstractMissingType type instead of remembering to check for both types of ‘missing’ files. []
    • Add a comment regarding potential security issue in the .changes, .dsc and .buildinfo comparators. []
    • Drop a large number of unused imports. [][][][][]
    • Make many code sections more Pythonic. [][][][]
    • Prevent some variable aliasing issues. [][][]
    • Use some tactical f-strings to tidy up code [][] and remove explicit u"unicode" strings [].
    • Refactor a large number of routines for clarity. [][][][]

trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12)

In addition Jean-Romain Garnier made the following changes:

  • Fix the --new-file option when comparing directories by merging DirectoryContainer.compare and Container.compare. (#180)
  • Allow user to mask/filter diff output via --diff-mask=REGEX. (!51)
  • Make child pages open in new window in the --html-dir presenter format. []
  • Improve the diffs in the --html-dir format. [][]

Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator [] and Mattia Rizzolo prevented warnings from the tlsh fuzzy-matching library during tests [] and tweaked the build system to remove an unwanted .build directory []. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [] and later 148 [].

Testing framework

We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:

  • Debian-related changes:

    • Prevent bogus failure emails from rsync2buildinfos.debian.net every night. []
    • Merge a fix from David Bremner’s database of .buildinfo files to include a fix regarding comparing source vs. binary package versions. []
    • Only run the Debian package rebuilder job twice per day. []
    • Increase bullseye scheduling. []
  • System health status page:

    • Add a note displaying whether a node needs to be rebooted for a kernel upgrade. []
    • Fix sorting order of failed jobs. []
    • Expand footer to link to the related Jenkins job. []
    • Add archlinux_html_pages, openwrt_rebuilder_today and openwrt_rebuilder_future to ‘known broken’ jobs. []
    • Add HTML <meta> header to refresh the page every 5 minutes. []
    • Count the number of ignored jobs [], ignore permanently ‘known broken’ jobs [] and jobs on ‘known offline’ nodes [].
    • Only consider the ‘known offline’ status from Git. []
    • Various output improvements. [][]
  • Tools:

    • Switch URLs for the Grml Live Linux and PureOS package sets. [][]
    • Don’t try to build a disorderfs Debian source package. [][][]
    • Stop building diffoscope as we are moving this to Salsa. [][]
    • Merge several “is diffoscope up-to-date on every platform?” test jobs into one [] and fail less noisily if the version in Debian cannot be determined [].

In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components [], Jelle van der Waa updated the “is diffoscope up-to-date in every platform” check for Arch Linux and diffoscope [], Mattia Rizzolo backed up a copy of a “remove script” run on the Codethink-hosted ‘jump server‘ [] and Vagrant Cascadian temporarily disabled the fixfilepath on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map categorised issue.

Lastly, the usual build node maintenance was performed by Holger Levsen [][], Mattia Rizzolo [] and Vagrant Cascadian [][][][][].



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:


This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports