Reproducible Builds in October 2019

View all our monthly reports


Welcome to the October 2019 report from the Reproducible Builds project. đź‘Ś

In our monthly reports we attempt outline the most important things that we have been up to recently. As a reminder on what our little project is all about, whilst anyone can inspect the source code of free software for malicious changes most software is distributed to end users or servers as precompiled binaries. Reproducible builds tries to ensure that no changes have been made during these compilation processes by promising identical results are always generated from a given source, allowing multiple third-parties to come to a consensus on whether a build was compromised.

In this month’s report, we will cover:

  • Media coverage & conferences — Reproducible builds in Belfast & science
  • Reproducible Builds Summit 2019 — Registration & attendees, etc.
  • Distribution work — The latest work in Debian, OpenWrt, openSUSE, and more…
  • Software development — More diffoscope development, etc.
  • Getting in touch — How to contribute & get in touch

If you are interested in contributing to our venture, please visit our Contribute page on our website.


Media coverage & conferences

Jonathan McDowell gave an introduction on Reproducible Builds in Debian at the Belfast Linux User Group:

Whilst not strictly related to reproducible builds, Sean Gallagher from Ars Technica wrote an article entitled Researchers find bug in Python script may have affected hundreds of studies:

A programming error in a set of Python scripts commonly used for computational analysis of chemistry data returned varying results based on which operating system they were run on.


Reproducible Builds Summit 2019

Registration for our fifth annual Reproducible Builds summit that will take place between the 1st and 8th December in Marrakesh, Morocco has opened and invitations have been sent out.

Similar to previous incarnations of the event, the heart of the workshop will be three days of moderated sessions with surrounding “hacking” days and will include a huge diversity of participants from Arch Linux, coreboot, Debian, F-Droid, GNU Guix, Google, Huawei, in-toto, MirageOS, NYU, openSUSE, OpenWrt, Tails, Tor Project and many more. We are still seeking additional sponsorship for the event. Sponsoring enables us to enable the attendance of people who would not otherwise be able to attend. If you or your company would be able to sponsor the event, please contact info@reproducible-builds.org.

If you would like to learn more about the event and how to register, please visit our our dedicated event page.


Distribution work

GNU Guix announced that they had significantly reduced the size of their “bootstrap seed” by replacing binutils, GCC and glibc with smaller alternatives resulting in the package manager “possessing a formal description of how to build all underlying software” in a reproducible way from a mere 120MB seed.

OpenWrt is a Linux-based operating system targeting wireless network routers and other embedded devices. This month Paul Spooren (aparcar) posted a patch to their mailing list adding KCFLAGS to the kernel build flags to make it easier to rebuild the official binaries.

Bernhard M. Wiedemann posted his monthly Reproducible Builds status update for the openSUSE distribution which describes how rpm was updated to run most builds with the -flto=auto argument, saving mirror disk space/bandwidth. In addition, maven-javadoc-plugin received a toolchain patch (originating from Debian) in order to normalise a date.

Debian

In Debian this month Didier Raboud (OdyX) started a discussion on the debian-devel mailing list regarding building Debian source packages in a reproducible manner (thread index). In addition, Lukas PĂĽhringer prepared an upload of in-toto, a framework to protect supply chain integrity by the Secure Systems Lab at New York University which was uploaded by Holger Levsen.

Holger Levsen started a new section on the Debian wiki to centralise to document the progress made on various Debian-specific reproducibility issues and noticed that the “essential” package set in the bullseye distribution became unreproducible again, likely due to a a bug in Perl itself. Holger also restarted a discussion on Debian bug #774415 which requests that the devscripts collection of utilities that “make the life of a Debian package maintainer easier” adds a script/wrapper to enable easier end-user testing of whether a package is reproducible.

Johannes Schauer (josch) explained that their mmdebstrap tool can create bit-for-bit identical Debian chroots of the unstable and buster distributions for both the essential and minbase bootstrap “variants”, and Bernhard M. Wiedemann contributed to a discussion regarding adding a “global” build switch to enable/disable Profile-Guided Optimisation (PGO) and Link-time optimisation in the dpkg-buildflags tool, nothing that “overall it is still very hard to get reproducible builds with PGO enabled.”

64 reviews of Debian packages were added, 10 were updated and 35 were removed this month adding to our knowledge about identified issues. Three new types were added by Chris Lamb (lamby): nondeterministic_output_in_code_generated_by_ros_genpy, nondeterministic_ordering_in_include_graphs_generated_by_doxygen & nondeterministic_defaults_in_documentation_generated_by_python_traitlets.

Lastly, there was a far-reaching discussion regarding the correctness and suitability of setting the TZ environment variable to UTC when it was noted that the value UTC0 was “technically” more correct.


Software development

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Lastly, a request from Steven Engler to sort fields in the PKG-INFO files generated by the setuptools Python module build utilities was resolved by Jason R. Coombs and Vagrant Cascadian added SOURCE_DATE_EPOCH support to LTSP’s manual page generation.

strip-nondeterminism & reprotest

strip-nondeterminism is our tool to remove specific non-deterministic results from successful builds. This month, Chris Lamb made a number of changes including uploading version 1.6.1-1 was to Debian unstable. This dropped the bug_803503.zip test fixture as it is no longer compatible with the latest version of Perl’s Archive::Zip module (#940973).

reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Iñaki Malerba updated our Salsa CI scripts […] as well as adding a --control-build parameter […]. Holger Levsen uploaded the package as 0.7.10, bumping the Debian “standards version” to 4.4.1 […].

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.

This month, Chris Lamb (lamby) made the following changes, including uploading versions 126, 127, 128 and 129 to the Debian unstable distribution:

  • Disassembling and reporting on files related to the R (programming language):

    • Expose an .rdb file’s absolute paths in the semantic/human-readable output, not hidden deep in a hexdump. […]
    • Rework and refactor the handling of .rdb files with respect to locating the parallel .rdx prior to inspecting the file to ensure that we do not add files to the user’s filesystem in the case of directly comparing two .rdb files or — worse — overwriting a file in is place. […]
    • Query the container for the full path of the parallel .rdx file to the .rdb file as well as looking in the same directory. This ensures that comparing two Debian packages shows any varying path. […]
    • Correct the matching of .rds files by also detecting newer versions of this file format. […]
    • Don’t read the site and user environment when comparing .rdx, .rdb or .rds files by using Rscript’s --vanilla option. […][…]
    • Ensure all object names are displayed, including ones beginning with a fullstop (.) […] and sort package fields when dumping data from .rdb files […].
    • Mask/hide standard error when processing .rdb files […] and don’t include useless/misleading NULL when dumping data from them. […]
    • Format package contents as foo = bar rather than using ugly and misleading brackets, etc. […] and include the object’s type […].
    • Don’t pass our long script to parse .rdb files via the command line; use standard input instead. […]
    • Call the deparse function to ensure that we do not error out and revert to a binary diff when processing .rdb files with internal “vector” types; they do not automatically coerce to strings. […]
    • Other misc/cosmetic changes. […][…][…]
  • Output/logging:
    • When printing an error from a command, format the command for the user. […]
    • Truncate very long command lines when displaying them as an external source of data. […]
    • When formatting command lines ensure newlines and other metacharacters appear escaped as \n, etc. […][…]
    • When displaying the standard error from commands, ensure we use the escaped version. […]
    • Use “exit code” over “return code” terminology when referring to UNIX error codes in displayed differences. […]
  • Internal API:
    • Add ability to pass bytestring input to external commands. […]
    • Split out command-line formatting into a separate utility function. […]
    • Add support for easily masking the standard error of commands. […][…]
    • To match the libarchive container, raise a KeyError exception if we request an invalid member from a directory. […]
    • Correct string representation output in the traceback when we cannot locate a specific item in a container. […]
  • Misc:
    • Move build-dependency on python-argcomplete to its Python 3 equivalent to facilitate Python 2.x removal. (#942967)
    • Track and report on missing Python modules. (#72)
    • Move from deprecated $ADTTMP to $AUTOPKGTEST_TMP in the autopkgtests. […]
    • Truncate the tcpdump expected diff to 8KB (from ~600KB). […]
    • Try and ensure that new test data files are generated dynamically, ie. at least no new ones are added without “good” reasons. […]
    • Drop unused BASE_DIR global in the tests. […]

In addition, Mattia Rizzolo updated our tests to run against all supported Python versions […] and to exit with a UNIX exit status of 2 instead of 1 in case of running out of disk space […]. Lastly Vagrant Cascadian updated diffoscope 126 and 129 in GNU Guix, and updated inputs for additional test suite coverage.

trydiffoscope is the web-based version of diffoscope and this month Chris Lamb migrated the tool to depend on the python3-docutils package over python-docutils to allow for Python 2.x removal (#943293) as well as updating the packaging to the latest Debian standards and conventions […][…][…].

Project website

There was yet more effort put into our our website this month, including Chris Lamb improving the formatting of reports  […][…][…][…][…] and tidying the new “Testing framework” links […], etc.

In addition, Holger Levsen add the Tor Project’s Reproducible Builds Manager to our “Who is Involved?” page and Mattia Rizzolo dropped a literal <br> HTML element […].

Test framework

We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:

  • Holger Levsen:
    • Debian-specific changes:
      • Add a script to ease powercycling x86 and arm64 nodes. […][…]
      • Don’t create suite-based directories for buildinfos.debian.net. […]
      • Make all four suites being tested shown in a single row on the performance page. […]
    • OpenWrt changes:
      • Only run jobs every third day. […]
      • Create jobs to run the reproducible_openwrt_rebuild.py script today and in the future. […]
  • Mattia Rizzolo:
    • Add some packages that were lost while updating to buster. […]
    • Fix the auto-offline functionality by checking the content of the permalinks file instead of following the lastSuccessfulBuild that no longer being updated. […]
  • Paul Spooren (OpenWrt):

The usual node maintenance was performed by Holger Levsen […][…], Mattia Rizzolo […][…][…] and Vagrant Cascadian […][…][…].


Getting in touch

If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:



This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levesen and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports