Reproducible Builds in July 2021

View all our monthly reports


Welcome to latest report from the Reproducible Builds project. In this post, we round up the important things that happened in the world of reproducible builds in July 2021. As always, if you are interested in contributing to the project, please visit the Contribute page on our website.

On Friday 27th August, Duc Ly Vu, Fabio Massacci, Ivan Pashchenko, Henrik Plate and Antonino Sabetta will present a paper at the ACM Foundations of Software Engineering (ESEC/FSE) conference. Titled LastPyMile: Identifying the Discrepancy between Sources and Packages, the abstract of the talk mentions that:

Our empirical assessment of 2,438 popular packages in PyPI with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole ‘most likely good but modified’ packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. []


Last month, we linked to Ars Technica’s report that counterfeit packages on PyPI, the official Python package repository, contained secret code that installed cryptomining software on infected machines. This month, however, Dan Goodin reported on another PyPI malware issue: in Software downloaded 30,000 times from PyPI ransacked developers’ machines, Dan writes about a number of malicious payloads (such as Discord token and credit card ‘stealers’) that appear to have targeted programmers’ computers. (Another source.)


Joshua Lock posted to the VMWare Open Source blog the first part of a two-part security-related series. Titled First Steps for Securing the Software Supply Chain, Joshua mentions:

The Reproducible Builds project develops tools, documentation, standards and patches for upstream open source projects that enable the production of bit-for-bit identical builds given the same inputs. This is no small feat, as many things influence the output of a build. The project’s major initial innovation was recognizing that the time at which a build runs is embedded into multiple artifacts produced during that build. It defined a standard way of fixing time for a build, called SOURCE_DATE_EPOCH, that more and more projects are adopting, and which removes a major source of non-deterministic output.

Joshua also mentions our sister Bootstrappable Builds project, as well as number of other reproducible adjacent tools such as the Bazel build system.


Touching on Bazel, Gaspare Vitta recently presented at the Conf42 Python 2021 on Reproducible Builds with Bazel. In the abstract for his talk, Gaspare writes:

If you run two builds with the same source code and the same commit but on two different machines, do you expect to get the same result? Well, in most cases you will not! In this talk, we’ll identify sources of non-determinism in most build processes and look at how Bazel can be used to create reproducible, hermetic builds. We’ll then create a reproducible Flask application that can be built with Bazel so that the Python interpreter and all dependencies are hermetical.


Lastly, it was noticed that Manuel Pöll’s thesis at the Johannes Kepler University in Linz, Austria is now available online. Called an An Investigation Into Reproducible Builds for AOSP (PDF), Manuel’s thesis touches on techniques to achieve deterministic builds in AOSP, more usually known as Google’s Android.


Community updates

We ran a productive meeting on IRC this month (original announcement) which ran for just short of two hours. A full set of notes from the meeting is available.

Chris Lamb updated the main Reproducible Builds website and documentation this month, including migrating the old ‘history’ page from the Debian wiki [], made the emphasis on 2020 less prominent on the events page [] in addition to many other changes. Also, Holger Levsen added MirageOS to our projects page [][] and Tobias Stoeckmann noted that the #archlinux-reproducible IRC channel has moved to the libera.chat network [].

A number of the Reproducible Builds team are in the process of building an ‘ecosystem map’ in order to better understand the relationships between projects in and around reproducible builds. This month, Chris Lamb posted a request to our mailing list to solicit input from the wider community.


Software development

diffoscope

diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made a number of changes, including releasing version 178) and version 179) as well as the following changes:

  • Ensure that various LLVM tools are installed, even when testing whether a MacOS binary has no differences compared to itself. (#270)
  • Rewrite how we calculate the ‘fuzzy hash’ of a file to make the control flow cleaner. [][]
  • Don’t traceback when encountering a broken symlink within a directory. (#269)
  • Update some copyright years. []

In addition, Edward Betts updated the try.diffoscope.org service to add a HTML alt attribute to an image. []


Debian

Roland Clobus sent a second status update on his progress towards fully-reproducible ‘Live’ ISO images. Amongst many other things, Roland mentions that all major configurations are now built on a daily basis and only the Cinnamon image is not reproducible. However, diffoscope has issues when comparing the results — work is in progress to address this #991059.

2 reviews of Debian packages were added, 50 were updated and 33 were removed this month adding to our knowledge about identified issues. Three issue types were updated, however: nondeterminism_in_autolex_bin is now fixed in Debian bullseye [], a new test_suite_logs issue was added [] and the description for the records_build_flags issue was updated [].

Helmut Grohne and Johannes Schauer Marin Rodrigues reported Debian bug #990712: “While working on DPKG_ROOT reproducibility, we observed that the [dpkg] trigger database differs for the foreign and native case”. []

Chris Lamb modified the Lintian static analyser for Debian packages to check for Python tracebacks in manual pages. These are usually caused by failing help2man calls and, crucially, cause reproducibility issues as the traceback includes absolute path names []. Lastly, Holger filed Debian bug #991285 to ‘unblock’ version 1.12-0.1 of strip-nondeterminism in order to ensure that this version ended up in the upcoming release of Debian bullseye.


Mobile development

It was noticed that from August 2021, Android ‘app bundles’ will become mandatory for the Google Play Store. This will result in smaller file sizes and other advantages for the end-user, yet it will also require app developers to push equivalent ‘APK’ versions of their apps to other non-Play Store channels as well. But this will also mean that developers will need to supply Google with their app signing keys. The introduction of code transparency for app bundles does add an optional code signing and verification mechanism (using a separate signing key held solely by the app developer). Unfortunately, code transparency files are not verified at install time — only manual verification is currently possible — and only guarantee the integrity of DEX and native code files (meaning interpreted code and assets could still have been modified). Further information can be found on the announcements on the Android Authority and XDA Developers sites.

In addition, The Jiten Japanese Dictionary and Bitcoin Wallet applications on the F-Droid application store are now reproducible using signatures in metadata. Lastly, it was noticed that the Android library bug affecting NewPipe also affects the Swiss Covid Certificate app.


Other distributions

Jelle van der Waa posted a blog post detailing the recent progress of reproducibility-related issues in Arch Linux , including issues with compressed manual pages as well as embedded build dates and hostnames. kpcyrd also posted a monthly report mentioning, reproducibility-related issues in Arch, in addition to documenting his progress towards reproducible Alpine Linux on the Raspberry Pi.

Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE.


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:


Testing framework

Reproducible Builds runs a Jenkins-based testing framework that powers tests.reproducible-builds.org. The following changes were made this month:

  • Alexander Couzens:

    • Correct OpenWRT-related log artifacts in a failure case. []
  • Holger Levsen:

    • Create a new view of Debian Live jobs maintained by Roland Clobus.
    • Randomize the start time of the Debian Live image building. []
    • Only run the Debian ‘rebuilder prototype’ on demand; it has mostly served it’s purpose. [][]
    • Detect diffoscope failures in the health check. [][]
    • Build packages with less parallelism on the i386 architecture to reduce load. [][]
    • Improve output of reproducible OpenWrt-related jobs. []
    • Note that a node is low on disk space in the health check, so remind us to remove old kernels. []
    • Add retired armhf architecture nodes to our definition of ‘zombies’. []
  • Mattia Rizzolo:

  • Roland Clobus:

    • Build all Debian ‘live’ images. []
    • Allow diffoscope to run for longer as the image is currently not reproducible. []
  • Vagrant Cascadian:

    • Default to using a tmpfs-backed /tmp directory for schroots. []
    • Retire most armhf architecture nodes with only 2GB of RAM. []
    • Match armhf nodes named ff* for in the common-functions script. []
    • Update number of armhf boards used for reproducible builds in the documentation. []



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports

Follow us on Twitter @ReproBuilds, Mastodon @reproducible_builds@fosstodon.org & Reddit and please consider making a donation. • Content licensed under CC BY-SA 4.0, style licensed under MIT. Templates and styles based on the Tor Styleguide. Logos and trademarks belong to their respective owners. • Patches for this website welcome via our Git repository (instructions) or via our mailing list. • Full contact info