Reproducible Builds in June 2021

← View all our monthly reports

Welcome to latest report from the Reproducible Builds project for June 2021. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.

Community news

Jake Edge of Linux Weekly News (LWN) published a lengthy article on 16th June describing various steps taken by the Fedora Linux distribution with respect to preventing supply-chain attacks:

The specter of more events like the SolarWinds supply-chain attacks is something that concerns many in our communities—and beyond. Linux distributions provide a supply chain that obviously needs to be protected against attackers injecting malicious code into the update stream. This problem recently came up on the Fedora devel mailing list, which led to a discussion covering a few different topics. For the most part, Fedora users are protected against such attacks, which is not to say there is nothing more to be done, of course.

The Google Security Blog introduced a new framework called “Supply chain Levels for Software Artifacts”, or SLSA (to be pronounced as ‘salsa’). In particular, SLSA level 4 (“currently the highest level”) not only requires a two-person review of all changes but also “a hermetic, reproducible build process” due to its “many auditability and reliability benefits”. Whilst a highly welcome inclusion in Google’s requirements, by equating reproducible builds with only the highest level of supply-chain security in their list, it might lead others to conclude that only the most secure systems can benefit from the benefits of reproducible builds, whilst it is a belief of the Reproducible Builds project that many more users, if not all, can do so.

Many media outlets (including The Verge, etc.) reported on how the United States’ FBI operated a messaging app as a ‘honeypot trap’ for a long period of time, leading to hundreds of arrests. According to the UK’s Financial Times, court documents describe how the FBI persuaded a software developer facing prison to allow the FBI to commandeer the app and to introduce it to suspected criminals:

Over the course of the next three years, the operation was able to inspect about 27m messages over 11,800 devices as ANOM gained popularity in criminal circles globally, pushed by the developer but also a network of crime “influencers” — experts in encrypted phones who encourage others to use such devices.

As the Financial Times reports, “it is unclear what exactly prompted the FBI and others to reveal the operation”, although others have suggested it may result from legal limits in timeframes for intercepting communications. The FBI’s operation raises ethical concerns which overlap with beliefs held by proponents of Reproducible Builds, not least of all because even the most unimpeachable actions by actors may result in the incidental surveillance of innocent people.

In similar legal news, Susan Landau posted to the Lawfare blog about the potential dangers posted by evidentiary software. In particular, she discusses concerns that proprietary software may be fundamentally incompatible with the ability of defendants have the right to know the nature of the evidence against them — this is a right that is explicitly enshrined, for instance, in the Sixth Amendment of United States Constitution. However,

At the time of our writing the article on the use of software as evidence, there was no overriding requirement that [United States] law enforcement provide a defendant with the code so that they might examine it themselves.

It is relevant here because if the inability to consult the relevant source code of does violate such rights, it may follow that a secure and reproducible build process will also be required — after all, it would be the output of the binary versions of the source code that is used to convict suspects, not the source code itself. As Susan points out:

Mistakes happen with software and sometimes the only way to find errors is to study the code itself—both of which have important implications for courtroom use of software programs.

The Reproducible Builds project restarted their IRC meetings this month. Taking place on the #reproducible-builds channel on the OFTC IRC network, the log of the meeting on 29th June is now available online, and the next meeting is due to take place on July 27th at 15:00 UTC (agenda).

Ars Technica are reporting that “counterfeit” packages in PyPI, the official Python package repository, contained secret code that installed cryptomining software on infected machines: “So-called typosquatting attacks succeed when targets accidentally mistype a name such as typing mplatlib or maratlib instead of the legitimate and popular package, matplotlib”. The article is at pains to points out that PyPI is not not abused any more than other repositories are:

Last year, packages downloaded thousands of times from RubyGems installed malware that attempted to intercept bitcoin payments. Two years before that, someone backdoored a 2-million-user code library hosted in NPM. Sonatype has tracked more than 12,000 malicious NPM packages since 2019.

Distribution work

Ariadne Conill published a detailed blog post this month detailing their work on security issues and concerns in the Alpine Linux distribution. In particular, Ariadne included an interesting section on an effort “to prove the reproducibility of Alpine package builds”:

To this end, I hope to have the Alpine 3.15 build fully reproducible. This will require some changes to abuild so that it produces buildinfo files, as well as a rebuilder backend. We plan to use the same buildinfo format as Arch Linux, and will likely adapt some of the other reproducible builds work Arch has done to Alpine.

Ariadne mentioned plans to have a meeting and a sprint during July, to be organised in and around the #alpine-reproducible channel on the OFTC IRC network, and later posted a round-up of security initiatives in Alpine during June which mentions, amongst many other things, the ability to demonstrate reproducible Alpine install images for the Raspberry Pi.

Elsewhere in Alpine news, kpcyrd posted a series of Tweets explaining the steps he made for a reproducible Alpine image. [1] [2]

For openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

The NixOS Linux distribution pulled off a technical and publicity coup this month by announcing that the ISO_minimal.x86_64-Linux image is 100% reproducible. The announcement was widely discussed on Hacker News, where the article has received in excess of 200 comments.

In early June, Nilesh Patra asked for help making Debian’s brian package build reproducibly. Felix C. Stegerman proposed two patches which seem to have fixed the remaining issues (#989693). These were submitted upstream, where they were shortly merged.

Felix C. Stegerman announced the release of v1.0.0 of apksigcopier, a tool to copy, extract and patch .apk signatures needed to facilitate reproducible builds on the F-Droid Android application store. Holger Levsen subsequently sponsored an upload to Debian. Felix C. Stegerman also reported that Android builds are sometimes not reproducible due to a bug in Android’s coreLibraryDesugaring. […]

Elsewhere in F-Droid, the Swiss COVID Certificate mobile app (which uses reproducible builds) has been added to F-Droid — the F-Droid developers have mentioned that the upstream developers have been very helpful in making this happen. Relatedly, the Android version of the Electrum Bitcoin Wallet has been made reproducible.

Lastly, Hannes Mehnert announced the launch of the reproducible MirageOS build infrastructure, together with where to obtain ‘unikernels’: “To provide a high level of assurance and trust, if you distribute binaries in 2021, you should have a recipe how they can be reproduced in a bit-by-bit identical way.”

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:
- deepdiff (report a ‘build failure in 2022’ issue)
- dulwich (build fails in the future due to expired GPG key)
- gtksourceview4 (report that build fails in uniprocessor machine)
- ipxe (ar(1) call needs to be deterministic)
- json-lib (report a date / epoch issue)
- kernel-default (two sorting and random-related issues)
- lepton (drop call to -march=native)
- lighttpd1 (build fails in 2036)
- openvas-smb (date and Portable Executable timestamp issue)
- python-MapProxy (report a ‘build fails on uniprocessor machine’ issue)
- python-gcsfs (report a ‘build fails on uniprocessor machine’ issue)
Nilesh Patra:
- #989572 filed against gl2ps.
- #989583 filed against liblip.
- #989693 filed against brian.
Vagrant Cascadian:
- #989963 filed against tclap.
- #989965 filed against gtk-sharp3.
- #989966 filed against gtk-sharp3.
- #990084 filed against graphicsmagick.
- #990246, #990247 and #990248 filed against vlc.
- #990253 filed against pmix.
- #990254 filed against openmpi.
- #990300 filed against auctex.
- #990323 filed against volume-key.
- #990327 filed against cppunit.
- #990329 filed against rpm.
- #990332 filed against libcddb.
- #990338 filed against autogen.
- #990339 filed against matplotlib.

Separate to this, Hans-Christoph Steiner noted there is a reproducibility-related bug in Python’s standard zipfile library. This problem makes it hard to create reproducible .zip files. In particular, Hans would like to have more input from Python people, since it is not clear how best to resolve the problem.

diffoscope

diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it provides human-readable diffs from many kinds of binary formats.

This month, Chris Lamb made a number of changes including releasing version 177). In addition, Chris updated the try.diffoscope.org service to reflect that Bytemark were acquired by the Iomart Group. […].

Balint Reczey:
- Support .deb package members that are compressed with the Zstandard compression algorithm. […]
Jean-Romain Garnier:
- Overhaul the Mach-O executable file comparator. […][…][…][…][…]
- Implement tests for the Mach-O comparator. […][…][…]
- Switch to new argument format for the LLVM compiler. […]
- Fix test_libmix_differences in testsuite for the ELF format. […][…]
- Improve macOS compatibility for the Mach-O comparator. […]
- Add llvm-readobj and llvm-objdump to the internal EXTERNAL_TOOLS data structure. […]
Mattia Rizzolo:
- Invoke gzip(1) with its ‘short’ option names in order to support Busybox’s version of the utility. […]

Website and documentation

A number of few changes were made to the main Reproducible Builds website and documentation this month, including:

Arnout Engelen:
- Credit Ludovic Courtès for the Guix page. […]
- Fix link to NixOS. […]
Chris Lamb:
- Use an ellipsis […] and drop a full stop […] to clarify ‘more items’ links.
- Update the link and logo to Google Open Source Security Team. […]
- Reduce the amount of bold text on the homepage. […]
- Document the non-reproducibility arising from abbreviated Git hashes depending on the number of total objects in a Git repository. […]
Hervé Boutemy:
- Add a Reproducible Central section section to the JVM page. […]
Holger Levsen:
- Add busybox to the list of software respecting the SOURCE_DATE_EPOCH environment variable for build timestamps if available. […]
Mattia Rizzolo:
- Fix a typo in a CSS class name. […]
- Add the (now-superseded) Linux Foundation Core Infrastructure Initiative to the list of historical sponsors. […]

Testing framework

The Reproducible Builds project operates a Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

Holger Levsen:
- Debian-related changes:
  - Initial stab at building and comparing Debian Live images. […]
  - Run the lb build Debian Live command with sudo(8). […][…]
  - Use safer and more common rm -rf syntax in/around Debian Live images. […]
  - Sync build results of Live images to our Jenkins instance. […]
  - Create a Debian unstable schroot for running diffoscope on the osuosl173 node so it can be used to test Debian Live images. […]
  - Cope with the Tails build manifests now only containing binary package names. […]
  - Do not incorrectly detect diskspace issues on OpenSSL builds. […]
  - Delete the reproducible_compare_Debian_sha1sums jobs. […]
- Automatic node health check improvements:
  - Detect non-fatal failures using a HTTP(S) proxy. […]
  - Detect failure to “make tools”. […]
  - Also detect “no route to host” issues. […]
  - Tune regular expression to detect proxy failures. […][…][…]
  - Misc aesthetic changes to the status page. […][…][…]
- Misc:
  - Configure the needrestart tool to restart all services automatically. […][…][…]
  - Increase the Linux kernel inotify watch limit further on all hosts. […]
  - Be more verbose when cloning Coreboot Git repository. […][…]
  - Properly delete old schroot overlays. […]
Mattia Rizzolo:
- Update the documentation regarding manual scheduling Debian builds to drop old references to the deprecated Alioth system. […]
- Update a number of IP addresses for armhf architecture machines. […]
Roland Clobus spent significant time on automatically building Debian Live images twice and comparing the output if they differ (Jenkins job page). This included:
- Actually build the images twice and compare the output. […][…]
- Improve cleanup routines. […][…][…][…]
- Store the ISO output. […][…][…]
- Various sudo(8)-related configuration changes. […][…][…][…]
Vagrant Cascadian:
- Document the access to the armhf architecture host servers. […][…]
- Update the number of armhf architecture jobs and machines. […]
- Add build jobs and SSH keys (etc.) for various new machines. […][…]

Finally, build node maintenance was performed by Holger Levsen […][…][…], Mattia Rizzolo […][…][…][…] and Vagrant Cascadian […].

Misc development news

Dan Shearer from the LumoSQL database project posted to the rb-general mailing list about reproducibility and microcode updates, emphasis ours:

Here at LumoSQL we do repeated runs testing SQLite of various versions and configurations, storing the results in an SQLite database. Here is an example of the kind of variation that justifies what some have called our ‘too-fussy’ test suite, a microcode update that changes behaviour from one day to another.

Finally, in last month’s report we wrote about Paul Spooren proposing a patch for the BusyBox suite of UNIX utilities so that it uses SOURCE_DATE_EPOCH for build timestamps if available. This was merged during June by Denys Vlasenko.

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: #reproducible-builds on irc.oftc.net.
Twitter (@ReproBuilds) and Mastodon (@reproducible_builds@fosstodon.org).
Reddit: /r/ReproducibleBuilds
Mailing list: rb-general@lists.reproducible-builds.org

← View all our monthly reports