Welcome to the January 2020 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to. In this month’s report, we cover:
- Upstream news & event coverage — Reproducing the Telegram messenger, etc.
- Software development — Updates and improvements to our tooling
- Distribution work — More work in Debian, openSUSE & friends
- Misc news — From our mailing list & how to get in touch etc.
What are reproducible builds?
Whilst anyone can inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
If you are interested in contributing, please visit the Contribute page on our website.
Upstream news & event coverage
The Telegram messaging application has documented full instructions for verifying that its original source code is exactly the same code that is used to build the versions available on the Apple App Store and Google Play.
Reproducible builds were mentioned in a panel on Software Distribution with Sam Hartman, Richard Fontana, & Eben Moglen at the Software Freedom Law Center’s 15h Anniversary Fall Conference (at ~35m21s).
Vagrant Cascadian will present a talk at SCALE 18x in Pasadena, California on March 8th titled There and Back Again, Reproducibly.
Matt Graeber (@mattifestation) posted on Twitter that:
If you weren’t aware of the reason Portable Executable timestamps in Win 10 binaries were nonsensical, Raymond’s post explains the reason: to support reproducible builds.
… referencing an article by Raymond Chen from January 2018 which, amongst other things, mentions:
One of the changes to the Windows engineering system begun in Windows 10 is the move toward reproducible builds.
Jan Nieuwenhuizen announced the release of GNU Mes 0.22. Vagrant Cascadian subsequently uploaded this version to Debian which produced a bit-for-bit identical mescc-mes-static
binary with the mes-rb5
package in GNU Guix.
Software development
diffoscope
diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour.
This month, diffoscope versions 135
and 136
were uploaded to Debian unstable by Chris Lamb. He also made the following changes to diffoscope itself, including:
-
New features:
- Support external difference tools such as Meld, etc. similar to
git-difftool(1)
. (#87) - Extract
resources.arsc
files as well asclasses.dex
from Android.apk
files to ensure that we show the differences there. (#27) - Fallback to the regular
.zip
container format for.apk
files ifapktool
is not available. […][…][…][…] - Drop
--max-report-size-child
and--max-diff-block-lines-parent
; scheduled for removal in January 2018. […] - Append a comment to a difference if we fallback to a less-informative container format but we are missing a tool. […][…]
- Support external difference tools such as Meld, etc. similar to
-
Bug fixes:
- No longer raise a
KeyError
exception if we request an invalid member from a directory container. […]
- No longer raise a
-
Documentation/workflow improvements:
-
Logging improvements:
- Log a debug-level message if we cannot open a file as container due to a missing tool to assist in diagnosing issues. […]
- Correct a debug message related to
compare_meta
calls to quote the arguments correctly. […] - Add the current
PATH
environment variable to theNormalising locale...
debug-level message. […] - Print the
Starting diffoscope $VERSION
line as the first line of the log as we are, well, starting diffoscope. […] - If we don’t know the HTML output name, don’t emit an enigmatically truncated
HTML output for
debug message. […]
-
Tests:
- Don’t exhaustively output the entire HTML report when testing the regression for #875281; parsing the JSON and pruning the tree should be enough. (#84)
- Refresh and update the fixtures for the
.ico
tests to match the latest version of Imagemagick in Debian unstable. […]
-
Code improvements:
- Add a
.git-blame-ignore-revs
file to improve the output ofgit-blame(1)
by ignoring large changes when introducing the Black source code reformatter and update theCONTRIBUTING.md
guide on how to optionally use it locally. […] - Add a
noqa
line to avoid a false-positive Flake8 “unused import” warning. […] - Move
logo.svg
to under thedoc/
directory […] and makesetup.py
executable […]. - Tidy
diffoscope.main
’sconfigure
method. […][…][…][…] - Drop an assertion that is guaranteed by parallel
if
conditional […] and an unused “Difference” import from the APK comparator. […] - Turn down the “volume” for a recommendation in a comment. […]
- Rename the
diffoscope.locale
module todiffoscope.environ
as we are modifying things beyond just the locale (eg. callingtzset
, etc.) […] - Factor-out the generation of
foo not available in path
comment messages into the exception that raises them […] and factor out running all of our manyzipinfo
into a new method […].
- Add a
-
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb fixed the PyPI.org release by adding the
trydiffoscope
script itself to theMANIFEST
file and performing another release cycle. […]
In addition, Marc Herbert adjusted the cbfstool
tests to search for expected keywords in the output, rather than specific output […], fixed a misplaced debugging line […] and added a “Testing” section to the CONTRIBUTING.rst
[…] file. Vagrant Cascadian updated to diffoscope 135 in GNU Guix.
reprotest
reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.11
and 0.7.12
were uploaded to Debian unstable by Holger Levsen. This month, Iñaki Malerba improved the version test to split on the +
character […] and Ross Vandegrift updated the code to allow the user to override timeouts from the surrounding environment […].
Holger Levsen also made the following additionally changes:
- Drop the
short
timeout and use theinstall
timeout instead. (#897442) - Use “real” reStructuredText comments instead of using the
raw
directive. […] - Update the PyPI classifier to express we are using Python 3.7 now. […]
Other tools
-
disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Chris Lamb fixed an issue by ignoring the return values of
fsyncdir
to ensure (for example)dpkg(1)
can “flush”/var/lib/dpkg
correctly […] and merged a change from Helmut Grohne to use the build architecture’s version of pkg-config to permit cross-architecture builds […]. -
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, version
1.6.3-2
was uploaded to Debian unstable by Holger Levsen to bump the Standards-Version. […]
Upstream development
The Reproducible Builds project detects, dissects and attempts to fix as many unreproducible packages as possible. Naturally, we endeavour to send all of our patches upstream. This month, we wrote another large number of such patches, including:
- Arnout Engelen (for the NixOS distribution):
bash
(enablePGRP_PIPE
regardless of build-time kernel version)jitterentropy
(remove timestamps from Gzip-compressed manpages, already filed upstream)ms-sys
(remove timestamps from.gz
manpages, already upstream)
- Bernhard M. Wiedemann (for the openSUSE distribution):
ImageMagick
(toolchain,.png
date)brickv
(sort a Pythonglob
/readdir(3)
)cpython
(.pyc
reproducibility)doxygen
(merged a toolchain patch to prevent nondeterminism from ASLR)fastjet-contrib
(sortfind
/readdir
)openjfx
(Java date)ruby
(Reopen unsorted Rubyglob
issue)rubygem-sassc
(sort a Rubyreaddir(3)
)
- Chris Lamb:
- #948279 filed against
python-gmusicapi
. - #948582 filed against
bochs
. - #948872 filed against
pcbasic
. - #949379 filed against
vmatch
. - #949580 filed against
pkg-js-tools
. - #949684 filed against
mcomix
. - #949817 filed against
shotcut
(forwarded upstream). - #950138 filed against
pikepdf
(forwarded upstream).
- #948279 filed against
- Jelle van der Waa (Arch Linux):
- Martin Liška:
gcc
(toolchain, fixing randomness in some.o
files, with Alexander Monakov & Richard Biener)
-
Vagrant Cascadian submitted a large number patches via the Debian bug tracking system targeting the packages Civil Infrastructure Platform as identified by the CIP package set including:
- #948757 & #948759 filed against
apache2
. - #948771 filed against
guile-2.2
. - #949114 & #949115 filed against
alsa-tools
. - #949270 & #949271 filed against
libtool
. - #949273 & #949275 filed against
geoip
. - #949324 filed against
groff
. - #949338 filed against
gettext
. - #949341 filed against
sqlite3
. - #949342 & #949343 filed against
flex
. - #949346 & #949348 filed against
libnet
.
- #948757 & #948759 filed against
Distribution work
openSUSE
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update and submitted the following bugs and patches:
doxygen
(toolchain, ASLR; already merged upstream)frotz
(version update & date)gcc9
(report unreproducible.o
files, forwarded upstream)mingw*
(report random filename in.a
files)perl-TimeDate
(fix a “year 2020” bug, forwarded upstream)python-sherpa
(CPU-detection via--mtune=native
)qpress
(make PGO reproducible)rubygem-sassc
(CPU &readdir
, partially submitted upsteam)stgit
(recreate unreproducible.pyc files with fixed filesystem
readdir(3)` order)xmvn
(report nondeterminism from filesystem order and randomness)
Many Python packages were updated to avoid writing .pyc
files with an embedded random path, including jupyter-jupyter-wysiwyg, jupyter-jupyterlab-latex, python-PsyLab, python-hupper, python-ipyevents (don’t rewrite .zip
file), python-ipyleaflet, python-jupyter-require, python-jupyter_kernel_test, python-nbdime (do not rewrite .zip
, avoid time-based .pyc
), python-nbinteract, python-plaster, python-pythreejs, python-sidecar & tensorflow (use pip install --no-compile
).
Debian
There was yet more progress towards making the Debian Installer images reproducible. Following-on from last months’ efforts, Chris Lamb requested a status update on the Debian bug in question.
Daniel Schepler posted to the debian-devel
mailing list to ask whether “running dpkg-buildpackage manually from the command line” is supported, particularly with respect to having extra packages installed during the package was built either resulted in a failed build or even broken packages (eg. #948522, #887902, etc.). Our .buildinfo
files could be one solution to this as they record the environment at the time of the package build.
Holger disabled scheduling of packages from the “oldstable” stretch release on tests.reproducible-builds.org. This is the first time since stretch’s existence that we are no longer testing this release.
OpenJDK, a free and open-source implementation of the Java Platform was updated in Debian to incorporate a number of patches from Emmanuel Bourg, including:
- Make the generated character data source files reproducible. (#933339)
- Make the generated
module-info.java
files reproducible. (#933342) - Make the generated copyright headers reproducible. (#933349)
- Make the build user reproducible. (#933373)
83 reviews of Debian packages were added, 32 were updated and 96 were removed this month adding to our knowledge about identified issues. Many issue types were updated by Chris Lamb, including timestamp
_in_casacore_tables,
random_identifiers_in_epub_files_generated_by_asciidoc,
nondeterministic_ordering_in_casacore_tables,
captures_build_path_in_golang_compiler,
captures_build_path_via_haskell_adddependentfile &
png_generated_by_plantuml_captures_kernel_version_and_builddate`.
Lastly, Mattia Rizzolo altered the permissions and shared the notes.git
repository which underpins the aforementioned package classifications with the entire “Debian” group on Salsa, therefore giving all DDs write access to it. This is an attempt to invite more direct contributions instead of merge requests.
Other distributions
The FreeBSD Project Tweeted that:
Reproducible builds are turned on by default for
-RELEASE
[…]
… which targets the next released version of this distribution (view revision). Daniel Ebdrup followed-up to note that this option:
Used to be turned on in
-CURRENT
when it was being tested, but it has been turned off now that there’s another branch where it’s used, whereas-CURRENT
has more need to have the revision printed inuname
(which is one of the things that make a build unreproducible). […]
For Alpine Linux, Holger Levsen disabled the builders run by the Reproducible Builds project as our patch to the abuild
utility (see December’s report doesn’t apply anymore and thus all builds have become unreproducible again. Subsequent to this, a patch was merged upstream. […]
In GNU Guix, on January 14th, Konrad Hinsen posted a blog post entitled Reproducible computations with Guix which, amongst other things remarks that:
The [
guix time-machine
command] machine actually downloads the specified version of Guix and passes it the rest of the command line. You are running the same code again. Even bugs in Guix will be reproduced faithfully!
The Yocto Project reported that they have reproducible cross-built binaries that are independent of both the underlying host distribution the build is run on and independent of the path used for the build. This is now being continually tested on the Yocto Project’s automated infrastructure to ensure this state is maintained in the future.
Project website & documentation
There was more work performed on our website this month, including:
-
Chris Lamb:
- Python
SOURCE_DATE_EPOCH
documentation, clarifying that the second example generates a Pythonstr
-type, not adatetime.datetime
[…] - Correct word omissions in the report template. […]
- Link to our mailing list overview page (and not the archives). […]
- Apply the Black source code reformatter to the draft generation script. […]
- Move continuous tests heading level to
<h1>
(vs.<h2>
) to match the other pages. […] - Calculate the report authors dynamically. […]
- Python
-
Holger Levsen:
- Add Alpine Linux to our projects and testing pages. […]
- Add links to our list of projects being tested […] and mark Fedora as being disabled at this time […].
In addition, Arnout Engelen added a Scala programming language example for the SOURCE_DATE_EPOCH
environment variable […], David del Amo updated the link to the Software Freedom Conversancy to remove some double parentheses […] and Peter Wu added a Debian example for the -ffile-prefix-map
argument to support Clang version 10 […].
Testing framework
We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:
- Adrian Bunk:
- Use the
et_EE
locale/language instead offr_CH
. In Estonian, the z character is sorted between s and t which is contrary to common incorrect assumptions about the sorting order of ASCII characters.. […] - Add
ffile_prefix_map_passed_to_clang
to the list of issues filtered as these build failures should be ignored. […] - Remove the
ftbfs_build_depends_not_available_on_amd64
from the list of filtered issues as this specific problem no longer exists. […]
- Use the
-
Holger Levsen:
- Debian:
- Always configure
apt
to ignore expired release files on hosts running in the future. […] - Create an “oldsuites” page, showing suites we used to test in the past. […][…][…][…][…]
- Schedule more old packages from the buster distribution. […]
- Deal with shell escaping and other options. […][…][…]
- Reverse the suite ordering on the packages page. […][…]
- Show bullseye statistics on dashboard page, moving away from buster […] and additionally omit stretch […].
- Always configure
- F-Droid:
- Document the increased diskspace requirements; we require over 700 GiB now. […]
- Misc:
- Debian:
-
Jelle van der Waa (Arch Linux):
- Mattia Rizzolo:
- Vagrant Cascadian special-cased
u-boot
on thearmhf
architecture: First, do not build theall
architecture as the dependencies are not available on this architecture […] and also pass the--binary-arch
argument topbuilder
too […].
The usual node maintenance was performed by Mattia Rizzolo […][…], Vagrant Cascadian […][…][…][…] and Holger Levsen.
Misc news
On our mailing list this month:
-
Chris Lamb responded in-depth to a thread on Reproducible system images that was started in December by Lars Wirzenius. This then led to a sub-thread regarding reproducible Docker images.
-
Holger Levsen posted a brief request for help regarding the bot that lives on our
#reproducible-builds
IRC channel that interfaces with our Twitter handle.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can also get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Arnout Engelen, Bernhard M. Wiedemann, Chris Lamb, heinrich5991
, Holger Levsen, Jelle van der Waa, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.