Welcome to latest report from the Reproducible Builds project. In this post, we round up the important things that happened in the world of reproducible builds in July 2021. As always, if you are interested in contributing to the project, please visit the Contribute page on our website.
On Friday 27th August, Duc Ly Vu, Fabio Massacci, Ivan Pashchenko, Henrik Plate and Antonino Sabetta will present a paper at the ACM Foundations of Software Engineering (ESEC/FSE) conference. Titled LastPyMile: Identifying the Discrepancy between Sources and Packages, the abstract of the talk mentions that:
Our empirical assessment of 2,438 popular packages in PyPI with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole ‘most likely good but modified’ packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. […]
Last month, we linked to Ars Technica’s report that counterfeit packages on PyPI, the official Python package repository, contained secret code that installed cryptomining software on infected machines. This month, however, Dan Goodin reported on another PyPI malware issue: in Software downloaded 30,000 times from PyPI ransacked developers’ machines, Dan writes about a number of malicious payloads (such as Discord token and credit card ‘stealers’) that appear to have targeted programmers’ computers. (Another source.)
Joshua Lock posted to the VMWare Open Source blog the first part of a two-part security-related series. Titled First Steps for Securing the Software Supply Chain, Joshua mentions:
The Reproducible Builds project develops tools, documentation, standards and patches for upstream open source projects that enable the production of bit-for-bit identical builds given the same inputs. This is no small feat, as many things influence the output of a build. The project’s major initial innovation was recognizing that the time at which a build runs is embedded into multiple artifacts produced during that build. It defined a standard way of fixing time for a build, called
SOURCE_DATE_EPOCH
, that more and more projects are adopting, and which removes a major source of non-deterministic output.
Joshua also mentions our sister Bootstrappable Builds project, as well as number of other reproducible adjacent tools such as the Bazel build system.
Touching on Bazel, Gaspare Vitta recently presented at the Conf42 Python 2021 on Reproducible Builds with Bazel. In the abstract for his talk, Gaspare writes:
If you run two builds with the same source code and the same commit but on two different machines, do you expect to get the same result? Well, in most cases you will not! In this talk, we’ll identify sources of non-determinism in most build processes and look at how Bazel can be used to create reproducible, hermetic builds. We’ll then create a reproducible Flask application that can be built with Bazel so that the Python interpreter and all dependencies are hermetical.
Lastly, it was noticed that Manuel Pöll’s thesis at the Johannes Kepler University in Linz, Austria is now available online. Called an An Investigation Into Reproducible Builds for AOSP (PDF), Manuel’s thesis touches on techniques to achieve deterministic builds in AOSP, more usually known as Google’s Android.
Community updates
We ran a productive meeting on IRC this month (original announcement) which ran for just short of two hours. A full set of notes from the meeting is available.
Chris Lamb updated the main Reproducible Builds website and documentation this month, including migrating the old ‘history’ page from the Debian wiki […], made the emphasis on 2020 less prominent on the events page […] in addition to many other changes. Also, Holger Levsen added MirageOS to our projects page […][…] and Tobias Stoeckmann noted that the #archlinux-reproducible
IRC channel has moved to the libera.chat network […].
A number of the Reproducible Builds team are in the process of building an ‘ecosystem map’ in order to better understand the relationships between projects in and around reproducible builds. This month, Chris Lamb posted a request to our mailing list to solicit input from the wider community.
Software development
diffoscope
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made a number of changes, including releasing version 178) and version 179) as well as the following changes:
- Ensure that various LLVM tools are installed, even when testing whether a MacOS binary has no differences compared to itself. (#270)
- Rewrite how we calculate the ‘fuzzy hash’ of a file to make the control flow cleaner. […][…]
- Don’t traceback when encountering a broken symlink within a directory. (#269)
- Update some copyright years. […]
In addition, Edward Betts updated the try.diffoscope.org service to add a HTML alt
attribute to an image. […]
Debian
Roland Clobus sent a second status update on his progress towards fully-reproducible ‘Live’ ISO images. Amongst many other things, Roland mentions that all major configurations are now built on a daily basis and only the Cinnamon image is not reproducible. However, diffoscope has issues when comparing the results — work is in progress to address this #991059.
2 reviews of Debian packages were added, 50 were updated and 33 were removed this month adding to our knowledge about identified issues. Three issue types were updated, however: nondeterminism_in_autolex_bin
is now fixed in Debian bullseye […], a new test_suite_logs
issue was added […] and the description for the records_build_flags
issue was updated […].
Helmut Grohne and Johannes Schauer Marin Rodrigues reported Debian bug #990712: “While working on DPKG_ROOT
reproducibility, we observed that the [dpkg
] trigger database differs for the foreign and native case”. […]
Chris Lamb modified the Lintian static analyser for Debian packages to check for Python tracebacks in manual pages. These are usually caused by failing help2man
calls and, crucially, cause reproducibility issues as the traceback includes absolute path names […]. Lastly, Holger filed Debian bug #991285 to ‘unblock’ version 1.12-0.1
of strip-nondeterminism in order to ensure that this version ended up in the upcoming release of Debian bullseye.
Mobile development
It was noticed that from August 2021, Android ‘app bundles’ will become mandatory for the Google Play Store. This will result in smaller file sizes and other advantages for the end-user, yet it will also require app developers to push equivalent ‘APK’ versions of their apps to other non-Play Store channels as well. But this will also mean that developers will need to supply Google with their app signing keys. The introduction of code transparency for app bundles does add an optional code signing and verification mechanism (using a separate signing key held solely by the app developer). Unfortunately, code transparency files are not verified at install time — only manual verification is currently possible — and only guarantee the integrity of DEX and native code files (meaning interpreted code and assets could still have been modified). Further information can be found on the announcements on the Android Authority and XDA Developers sites.
In addition, The Jiten Japanese Dictionary and Bitcoin Wallet applications on the F-Droid application store are now reproducible using signatures in metadata. Lastly, it was noticed that the Android library bug affecting NewPipe also affects the Swiss Covid Certificate app.
Other distributions
Jelle van der Waa posted a blog post detailing the recent progress of reproducibility-related issues in Arch Linux , including issues with compressed manual pages as well as embedded build dates and hostnames. kpcyrd also posted a monthly report mentioning, reproducibility-related issues in Arch, in addition to documenting his progress towards reproducible Alpine Linux on the Raspberry Pi.
Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
containerd
(parallelism and Golangbuildid
issues)curl
(build fails in 2030)duplicity
(date-related issue)google-guest-agent
(parallelism Golangbuildid
issues)google-osconfig-agent
(parallelism and Golangbuildid
issues)guile-git
(parallelism / Guile)latex2html
(report PID-nondeterminism)lxd
(parallelism and Golangbuildid
issues)monitoring-plugins
(drop date fromgettextize
)perl-Web-Machine
(build failure in 2036)starlette
(report build failure in single-core VM)sudoku-sensei
(ASLR-issue via toolchain component: reported upstream)watchdog
(report build failure in single-core VM)
-
Jelle van der Waa:
percona-toolkit
(date)skaffold
(date)
-
Nilesh Patra:
-
Richard Purdie:
python-setuptools
: Sort the output ofglob.glob
as it inherits the nondeterministic ordering ofos.listdir
and the underlying filesystem. […]
-
Vagrant Cascadian:
- #990339 previously filed against
matplotlib
(now submitted upstream). - #990839 filed against
opentest4j
. - #990840 filed against
apiguardian
. - #990843 and #990844 filed against
libtheora
. - #990858 filed against
dask
. - #990862 filed against
infinipath-psm
. - #990910 filed against
p7zip
. - #990912 filed against
perl-tk
. - #990914 filed against
lcov
. - #990952, #990953 and #990969 filed against
lxml
. - #990999 filed against
biber
. - #991001 and #991002 filed against
automake1.11
. - #991020 filed against
gcc-mingw-w64
. - #991104 and #991106 filed against
antlr
. - #991177 filed against
libdebian-installer
. - #991180 filed against
xaw3d
. - #991181 filed against
cmocka
.
- #990339 previously filed against
Testing framework
Reproducible Builds runs a Jenkins-based testing framework that powers tests.reproducible-builds.org
. The following changes were made this month:
-
Alexander Couzens:
- Correct OpenWRT-related log artifacts in a failure case. […]
-
Holger Levsen:
- Create a new view of Debian Live jobs maintained by Roland Clobus.
- Randomize the start time of the Debian Live image building. […]
- Only run the Debian ‘rebuilder prototype’ on demand; it has mostly served it’s purpose. […][…]
- Detect diffoscope failures in the health check. […][…]
- Build packages with less parallelism on the
i386
architecture to reduce load. […][…] - Improve output of reproducible OpenWrt-related jobs. […]
- Note that a node is low on disk space in the health check, so remind us to remove old kernels. […]
- Add retired
armhf
architecture nodes to our definition of ‘zombies’. […]
-
Mattia Rizzolo:
- Share the same Apache web server settings between
debian
anddebian_live_build
artifacts. […]
- Share the same Apache web server settings between
-
Roland Clobus:
- Build all Debian ‘live’ images. […]
- Allow diffoscope to run for longer as the image is currently not reproducible. […]
-
Vagrant Cascadian:
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter (@ReproBuilds) and Mastodon (@reproducible_builds@fosstodon.org).
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org