Reproducible Builds in July 2020

View all our monthly reports


Welcome to the July 2020 report from the Reproducible Builds project.

In these monthly reports, we round-up the things that we have been up to over the past month. As a brief refresher, the motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced from the original free software source code to the pre-compiled binaries we install on our systems. (If you’re interested in contributing to the project, please visit our main website.)

General news

At the upcoming DebConf20 conference (now being held online), Holger Levsen will present a talk on Thursday 27th August about “Reproducing Bullseye in practice”, focusing on independently verifying that the binaries distributed from ftp.debian.org were made from their claimed sources.

Tavis Ormandy published a blog post making the provocative claim that “You don’t need reproducible builds”, asserting elsewhere that the many attacks that have been extensively reported in our previous reports are “fantasy threat models”. A number of rebuttals have been made, including one from long-time contributor Reproducible Builds contributor Bernhard Wiedemann.

On our mailing list this month, Debian Developer Graham Inggs posted to our list asking for ideas why the openorienteering-mapper Debian package was failing to build on the Reproducible Builds testing framework. Chris Lamb remarked from the build logs that the package may be missing a build dependency, although Graham then used our own diffoscope tool to show that the resulting package remains unchanged with or without it. Later, Nico Tyni noticed that the build failure may be due to the relationship between the FILE C preprocessor macro and the -ffile-prefix-map GCC flag.

An issue in Zephyr, a small-footprint kernel designed for use on resource-constrained systems, around .a library files not being reproducible was closed after it was noticed that a key part of their toolchain was updated that now calls --enable-deterministic-archives by default.

Reproducible Builds developer kpcyrd commented on a pull request against the libsodium cryptographic library wrapper for Rust, arguing against the testing of CPU features at compile-time. He noted that:

I’ve accidentally shipped broken updates to users in the past because the build system was feature-tested and the final binary assumed the instructions would be present without further runtime checks

David Kleuker also asked a question on our mailing list about using SOURCE_DATE_EPOCH with the install(1) tool from GNU coreutils. When comparing two installed packages he noticed that the filesystem ‘birth times’ differed between them. Chris Lamb replied, realising that this was actually a consequence of using an outdated version of diffoscope and that a fix was in diffoscope version 146 released in May 2020.

Later in July, John Scott posted asking for clarification regarding on the Javascript files on our website to add metadata for LibreJS, the browser extension that blocks non-free Javascript scripts from executing. Chris Lamb investigated the issue and realised that we could drop a number of unused Javascript files [][][] and added unminified versions of Bootstrap and jQuery [].


Development work

Website

On our website this month, Chris Lamb updated the main Reproducible Builds website and documentation to drop a number of unused Javascript files [][][] and added unminified versions of Bootstrap and jQuery []. He also fixed a number of broken URLs [][].

Gonzalo Bulnes Guilpain made a large number of grammatical improvements [][][][][] as well as some misspellings, case and whitespace changes too [][][].

Lastly, Holger Levsen updated the README file [], marked the Alpine Linux continuous integration tests as currently disabled [] and linked the Arch Linux Reproducible Status page from our projects page [].

diffoscope

diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds. In July, Chris Lamb made the following changes to diffoscope, including releasing versions 150, 151, 152, 153 & 154:

  • New features:

    • Add support for flash-optimised F2FS filesystems. (#207)
    • Don’t require zipnote(1) to determine differences in a .zip file as we can use libarchive. []
    • Allow --profile as a synonym for --profile=-, ie. write profiling data to standard output. []
    • Increase the minimum length of the output of strings(1) to eight characters to avoid unnecessary diff noise. []
    • Drop some legacy argument styles: --exclude-directory-metadata and --no-exclude-directory-metadata have been replaced with --exclude-directory-metadata={yes,no}. []
  • Bug fixes:

    • Pass the absolute path when extracting members from SquashFS images as we run the command with working directory in a temporary directory. (#189)
    • Correct adding a comment when we cannot extract a filesystem due to missing libguestfs module. []
    • Don’t crash when listing entries in archives if they don’t have a listed size such as hardlinks in ISO images. (#188)
  • Output improvements:

    • Strip off the file offset prefix from xxd(1) and show bytes in groups of 4. []
    • Don’t emit javap not found in path if it is available in the path but it did not result in an actual difference. []
    • Fix ... not available in path messages when looking for Java decompilers that used the Python class name instead of the command. []
  • Logging improvements:

    • Add a bit more debugging info when launching libguestfs. []
    • Reduce the --debug log noise by truncating the has_some_content messages. []
    • Fix the compare_files log message when the file does not have a literal name. []
  • Codebase improvements:

    • Rewrite and rename exit_if_paths_do_not_exist to not check files multiple times. [][]
    • Add an add_comment helper method; don’t mess with our internal list directly. []
    • Replace some simple usages of str.format with Python ‘f-strings’ [] and make it easier to navigate to the main.py entry point [].
    • In the RData comparator, always explicitly return None in the failure case as we return a non-None value in the success one. []
    • Tidy some imports [][][] and don’t alias a variable when we do not use it. []
    • Clarify the use of a separate NullChanges quasi-file to represent missing data in the Debian package comparator [] and clarify use of a ‘null’ diff in order to remember an exit code. []
  • Other changes:

Jean-Romain Garnier also made the following changes:

  • Allow passing a file with a list of arguments via diffoscope @args.txt. (!62)
  • Improve the output of side-by-side diffs by detecting added lines better. (!64)
  • Remove offsets before instructions in objdump [][] and remove raw instructions from ELF tests [].

Other tools

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. In July, Chris Lamb ensured that we did not install the internal handler documentation generated from Perl POD documents [] and fixed a trivial typo []. Marc Herbert added a --verbose-level warning when the Archive::Cpio Perl module is missing. (!6)

reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes to support diffoscope version 153 which had removed the (deprecated) --exclude-directory-metadata and --no-exclude-directory-metadata command-line arguments, and updated the testing configuration to also test under Python version 3.8 [].


Distributions

Debian

In June 2020, Timo Röhling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of hundreds of packages that use the CMake build system. This month however, Niels Thykier uploaded debhelper version 13.2 that passes the -DCMAKE_SKIP_RPATH=ON and -DBUILD_RPATH_USE_ORIGIN=ON arguments to CMake when using the (currently-experimental) Debhelper compatibility level 14.

According to Niels, this change:

… should fix some reproducibility issues, but may cause breakage if packages run binaries directly from the build directory.

34 reviews of Debian packages were added, 14 were updated and 20 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised the nondeterministic_order_of_debhelper_snippets_added_by_dh_fortran_mod [] and gem2deb_install_mkmf_log [] toolchain issues.

Lastly, Holger Levsen filed two more wishlist bugs against the debrebuild Debian package rebuilder tool [][].

openSUSE

In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.

Bernhard also published the results of performing 12,235 verification builds of packages from openSUSE Leap version 15.2 and, as a result, created three pull requests against the openSUSE Build Result Compare Script [][][].

Other distributions

In Arch Linux, there was a mass rebuild of old packages in an attempt to make them reproducible. This was performed because building with a previous release of the pacman package manager caused file ordering and size calculation issues when using the btrfs filesystem.

A system was also implemented for Arch Linux packagers to receive notifications if/when their package becomes unreproducible, and packagers now have access to a dashboard where they can all see all their unreproducible packages (more info).

Paul Spooren sent two versions of a patch for the OpenWrt embedded distribution for adding a ‘build system’ revision to the ‘packages’ manifest so that all external feeds can be rebuilt and verified. [][]

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including:

Vagrant Cascadian also reported two issues, the first regarding a regression in u-boot boot loader reproducibility for a particular target [] and a non-deterministic segmentation fault in the guile-ssh test suite []. Lastly, Jelle van der Waa filed a bug against the MeiliSearch search API to report that it embeds the current build date.

Testing framework

We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org.

This month, Holger Levsen made the following changes:

  • Debian-related changes:

    • Tweak the rescheduling of various architecture and suite combinations. [][]
    • Fix links for ‘404’ and ‘not for us’ icons. (#959363)
    • Further work on a rebuilder prototype, for example correctly processing the sbuild exit code. [][]
    • Update the sudo configuration file to allow the node health job to work correctly. []
    • Add php-horde packages back to the pkg-php-pear package set for the bullseye distribution. []
    • Update the version of debrebuild. []
  • System health check development:

    • Add checks for broken SSH [], logrotate [], pbuilder [], NetBSD [], ‘unkillable’ processes [], unresponsive nodes [][][][], proxy connection failures [], too many installed kernels [], etc.
    • Automatically fix some failed systemd units. []
    • Add notes explaining all the issues that hosts are experiencing [] and handle zipped job log files correctly [].
    • Separate nodes which have been automatically marked as down [] and show status icons for jobs with issues [].
  • Misc:

In addition, Mattia Rizzolo updated the init_node script to suggest using sudo instead of explicit logout and logins [][] and the usual build node maintenance was performed by Holger Levsen [][][][][][], Mattia Rizzolo [][] and Vagrant Cascadian [][][][].



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports