Welcome to the August 2019 report from the Reproducible Builds project!
In these monthly reports we outline the most important things that have happened in the world of Reproducible Builds and we have been up to.
As a quick recap of our project, whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed to end users or systems as precompiled binaries. The motivation behind the reproducible builds effort is to ensure zero changes have been introduced during these compilation processes. This is achieved by promising identical results are always generated from a given source thus allowing multiple third-parties to come to a consensus on whether a build was changed or even compromised.
In August’s month’s report, we cover:
- Media coverage & events — Webmin, CCCamp, etc.
- Distribution work — The first fully-reproducible package sets, openSUSE update, etc
- Upstream news — libfaketime updates, gzip, ensuring good definitions, etc.
- Software development — More work on diffoscope, new variations in our testing framework, etc.
- Misc news — From our mailing list, etc.
- Getting in touch — How to contribute, etc
If you are interested in contributing to our project, please visit our Contribute page on our website.
Media coverage & events
A backdoor was found in Webmin a popular web-based application used by sysadmins to remotely manage Unix-based systems. Whilst more details can be found on upstream’s dedicated exploit page, it appears that the build toolchain was compromised. Especially of note is that the exploit “did not show up in any Git diffs” and thus would not have been found via an audit of the source code. The backdoor would allow a remote attacker to execute arbitrary commands with superuser privileges on the machine running Webmin. Once a machine is compromised, an attacker could then use it to launch attacks on other systems managed through Webmin or indeed any other connected system. Techniques such as reproducible builds can help detect exactly these kinds of attacks that can lay dormant for years. (LWN comments)
In a talk titled There and Back Again, Reproducibly! Holger Levsen and Vagrant Cascadian presented at the 2019 edition of the Linux Developer Conference in São Paulo, Brazil on Reproducible Builds.
LWN posted and hosted an interesting summary and discussion on Hardening the file
utility for Debian. In July, Chris Lamb had cross-posted his reply to the “Re: file(1) now with seccomp support enabled” thread, originally started on the debian-devel
mailing list. In this post, Chris refers to our strip-nondeterminism
tool not being able to accommodate the additional security hardening in file(1)
and the changes made to the tool in order to fix this issue which was causing a huge number of regressions in our testing framework.
The Chaos Communication Camp — an international, five-day open-air event for hackers that provides a relaxed atmosphere for free exchange of technical, social, and political ideas — hosted its 2019 edition where there were many discussions and meet-ups at least partly related to Reproducible Builds. This including the titular Reproducible Builds Meetup session which was attended by around twenty-five people where half of them were new to the project as well as a session dedicated to all Arch Linux related issues.
Distribution work
In Debian, the first “package sets” — ie. defined subsets of the entire archive — have become 100% reproducible including as the so-called “essential” set for the bullseye distribution on the amd64
and the armhf
architectures. This is thanks to work by Chris Lamb on bash
, readline
and other low-level libraries and tools. Perl still has issues on i386
and arm64
, however.
Dmitry Shachnev filed a bug report against the debhelper
utility that speaks to issues around using the date from the debian/changelog
file as the source for the SOURCE_DATE_EPOCH
environment variable as this can lead to non-intuitive results when package is automatically rebuilt via so-called binary (NB. not “source”) NMUs. A related issue was later filed against qtbase5-dev
by Helmut Grohne as this exact issue led to an issue with co-installability across architectures.
Lastly, 115 reviews of Debian packages were added, 45 were updated and 244 were removed this month, appreciably adding to our knowledge about identified issues. Many issue types were updated by Chris Lamb, including embeds_build_data_via_node_preamble
, embeds_build_data_via_node_rollup
, captures_build_path_in_beam_cma_cmt_files
, captures_varying_number_of_build_path_directory_components
(discussed later), timezone_specific_files_due_to_haskell_devscripts
, etc.
Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution. New issues were found from enabling Link Time Optimization (LTO) in this distribution’s Tumbleweed branch. This affected, for example, nvme-cli
as well as perl-XML-Parser
and pcc
with packaging issues.
Upstream news
-
libfaketime
is a tool to trick programs into believing that the current system time is actually one specified by the user. This month, Bernhard M. Wiedemann requested the ability to track and intercept calls that change file timestamps which can help better debug or fix reproducibility issues in software. -
Chris Lamb requested that the “molior” build tool prefers to use the term “repeatable build” in order to avoid confusion over the term “reproducible.”
-
The “gzip” program is commonly used to compress artifacts such as the the source code archives generated by Sourcehut hosting platform, but depending on the specific program used, the output may be different. Daniel Edgecumbe has submitted patches to the BusyBox suite of tools to ensure the output of its version of
gzip
matches the output of GNU gzip when using the same options regardless of the configuration of BusyBox. In the process, an off-by-one error in the default settings was also fixed. -
There was more progress on ensuring that the
gem
tool in rubygems respects theSOURCE_DATE_EPOCH
environment variable.
- A request to include
.buildinfo
files in the OpenWRT operating system that targets embedded devices such as routes, etc. was accepted and merged upstream.
Software development
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. In August we wrote a large number of such patches, including:
- Bernhard M. Wiedemann:
buildad
(date)dracut
(CPU influences build result)fwupd
(unreproducible LTO data)gnutls
(date / copyright year)katacontainers-image-initrd/osbuilder
(shell date; new variant with nanoseconds)kernel-obs-build
(date from/etc/shadow
)kernel-vanilla
(drop number of CPUs)libfaketime
(toolchain: fix various builds underlibfaketime
)nethack
(date andtar(1)
)pcc
(unreproducible when building with LTO)python-ipyparallel
(Fails to build with a single CPU /-j1
)python-pytest-httpserver
(renew SSL certs to fix FTBFS after September 2019)python-python3-saml
(Fails to build in 2020)sblim-cmpi-base
(Disable parallelmake
due to broken build dependencies)
- Chris Lamb:
- #872728 filed against
desktop-file-utils
(closed) - #933783 filed against
virulencefinder
. - #933790 filed against
norsnet
. - #933834 filed against
haskell-devscripts
. - #933838 filed against
superlu-dist
. - #934120 filed against
python-bleach
. - #934697 filed against
re2c
(filed upstream). - #934698 filed against
libchamplain
(filed upstream) - #934699 filed against
scons
. - #934767 filed against
ecbuild
. - #934918 filed against
python-etcd3gw
. - #934919 filed against
omnidb
. - #935127 filed against
bash
. - #935361 filed against
node-autoprefixer
. - #935362 filed against
gdbm
. - #935363 filed against
readline
. - #935790 filed against
node-package-preamble
. - #935846 filed against
musescore-snapshot
. - #936452 filed against
ust-fs-extra
. - #936453 filed against
litl
.
- #872728 filed against
- Mathieu Parent:
php-pear
— Fixes over 150 packages with date issues.
diffoscope
diffoscope
is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
This month, Chris Lamb made the following changes:
- Improvements:
- Don’t fallback to an unhelpful raw hexdump when, for example,
readelf(1)
reports an minor issue in a section in an ELF binary. For example, when the.frames
section is of theNOBITS
type its contents are apparently “unreliable” and thusreadelf(1)
returns 1. (#58, #931962) - Include either standard error or standard output (not just the latter) when an external command fails. […]
- Don’t fallback to an unhelpful raw hexdump when, for example,
- Bug fixes:
- Skip calls to
unsquashfs
when we are neither root nor running underfakeroot
. (#63) - Ensure that all of our artificially-created
subprocess.CalledProcessError
instances haveoutput
instances that arebytes
objects, notstr
. […] - Correct a reference to
parser.diff
;diff
in this context is a Python function in the module. […] - Avoid a possible traceback caused by a
str
/bytes
type confusion when handling the output of failing external commands. […]
- Skip calls to
-
Testsuite improvements:
- Test for
4.4
in the output ofsquashfs -version
, even though the Debian package version is1:4.3+git190823-1
. […] - Apply a patch from László Böszörményi to update the
squashfs
test output and additionally bump the required version for the test itself. (#62 & #935684) - Add the
wabt
Debian package to the test-dependencies so that we run the WebAssembly tests on our continuous integration platform, etc. […]
- Test for
- Improve debugging:
- Add the containing module name to the (eg.) “
Using StaticLibFile for ...
” debugging messages. […] - Strip off trailing “
original size modulo 2^32 671
” (etc.) fromgzip
compressed data as this is just a symptom of the contents itself changing that will be reflected elsewhere. (#61) - Avoid a lack of space between “
... with return code 1
” and “Standard output
”. […] - Improve debugging output when instantantiating our
Comparator
object types. […] - Add a literal “eg.” to the comment on stripping “
original size modulo...
” text to emphasise that the actual numbers are not fixed. […]
- Add the containing module name to the (eg.) “
- Internal code improvements:
- No need to parse the section group from the class name; we can pass it via
type
built-inkwargs
argument. […] - Add support to
Difference.from_command_exc
and friends to ignore specific returncodes from the called program and treat them as “no” difference. […] - Simplify parsing of optional
command_args
argument toDifference.from_command_exc
. […] - Set
long_description_content_type
totext/x-rst
to appease the PyPI.org linter. […] - Reposition a comment regarding an exception within the indented block to match Python code convention. […]
- No need to parse the section group from the class name; we can pass it via
In addition, Mattia Rizzolo made the following changes:
Lastly, Vagrant Cascadian updated diffoscope to versions 120, 121 and 122 in the GNU Guix distribution.
strip-nondeterminism
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb made the following changes.
- Add support for enabling and disabling specific normalizers via the command line. (#10)
- Drop accidentally-committed warning emitted on every fixture-based test. […]
- Reintroduce the
.ar
normalizer […] but disable it by default so that it can be enabled with--normalizers=+ar
or similar. (#3) - In verbose mode, print the normalizers that
strip-nondeterminism
will apply. […]
In addition, there was some movement on an issue in the Archive::Zip
Perl module that strip-nondeterminism
uses regarding the lack of support for bzip
compression that was originally filed in 2016 by Andrew Ayer.
Test framework
We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org.
This month Vagrant Cascadian suggested and subsequently implemented that we additionally test a varying build directory of different string lengths (eg. /path/to/123
vs /path/to/123456
but we also vary the number of directory components within this, eg. /path/to/dir
vs. /path/to/parent/subdir
. Curiously, whilst it was a priori believed that was rather unlikely to yield differences, Chris Lamb has managed to identify approximately twenty packages that are affected by this issue.
It was also noticed that our testing of the Coreboot free software firmware fails to build the toolchain since we switched to building on the Debian buster
distribution. The last successful build was on August 7th but all newer builds have failed.
In addition, the following code changes were performed in the last month:
-
Chris Lamb: Ensure that the size the log for the second build in HTML pages was also correctly formatted (eg. “
12KB
” vs “12345
”). […] -
Holger Levsen:
- Many changes related to updating our build nodes to the
buster
distribution for Debian. […][…][…][…][…][…][…] - Attempt to automatically fixup spurious build failures. […]
- Update the maintainer address for the Debian team tasked with maintaining the MATE desktop. […]
- Try not to build all the release tags of tools such as diffoscope, etc.. […]
- Use a newer kernel to support building the latest Arch Linux packages. […]
- Re-add checks for “zombie” and log file size sanity checks. […][…][…][…]
- Vary the choice of kernel on the
amd64
again by using the kernel from Debian “backports”. […] - Drop some ancient Debian
jessie
-related configuration. […][…][…]
- Many changes related to updating our build nodes to the
-
Mathieu Parent: Update the contact details for the Debian PHP Group. […]
-
Mattia Rizzolo:
The usual node maintenance was performed by Holger Levsen […][…] and Vagrant Cascadian […].
Misc news
There was a yet more effort put into our our website this month, including misc copyediting by Chris Lamb […], Mathieu Parent referencing his fix for php-pear
[…] and Vagrant Cascadian updating a link to his homepage. […].
On our mailing list this month Santiago Torres Arias started a Setting up a MS-hosted rebuilder with in-toto metadata thread regarding Microsoft’s interest in setting up a rebuilder for Debian packages touching on issues of transparency logs and the integration of in-toto by the Secure Systems Lab at New York University. In addition, Lars Wirzenius continued conversation regarding various questions about reproducible builds and their bearing on building a distributed continuous integration system.
Lastly, in a thread titled Reproducible Builds technical introduction tutorial Jathan asked whether anyone had some “easy” Reproducible Builds tutorials in slides, video or written document format.
Getting in touch
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa, Mathieu Parent and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.