Reproducible Builds in April 2020

View all our monthly reports


Welcome to the April 2020 report from the Reproducible Builds project. In our regular reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News

It was discovered that more than 725 malicious packages were downloaded thousands of times from RubyGems, the official channel for distributing code for the Ruby programming language. Attackers used a variation of “typosquatting” and replaced hyphens and underscores (for example, uploading a malevolent atlas-client in place of atlas_client) that executed a script that intercepted Bitcoin payments. (Ars Technica report)

Bernhard M. Wiedemann launched ismypackagereproducibleyet.org, a service that takes a package name as input and displays whether the package is reproducible in a number of distributions. For example, it can quickly show the status of Perl as being reproducible on openSUSE but not in Debian. Bernhard also improved the documentation of his “unreproducible package” to add some example patches for hash issues. [].

There was a post on Chaos Computer Club’s website listing Ten requirements for the evaluation of “Contact Tracing” apps in relation to the SARS-CoV-2 epidemic. In particular:

4. Transparency and verifiability: The complete source code for the app and infrastructure must be freely available without access restrictions to allow audits by all interested parties. Reproducible build techniques must be used to ensure that users can verify that the app they download has been built from the audited source code.

Elsewhere, Nicolas Boulenguez wrote a patch for the Ada programming language component of the GCC compiler to skip -f.*-prefix-map options when writing Ada Library Information files. Amongst other properties, these .ali files embed the compiler flags used at the time of the build which results in the absolute build path being recorded via -ffile-prefix-map, -fdebug-prefix-map, etc.

In the Arch Linux project, kpcyrd reported that they held their first “rebuilder workshop”. The session was held on IRC and participants were provided a document with instructions on how to install and use Arch’s repro tool. The meeting resulted in multiple people with no prior experience of Reproducible Builds validate their first package. Later in the month he also announced that it was now possible to run independent rebuilders under Arch in a “hands-off, everything just works™” solution to distributed package verification.

Mathias Lang submitted a pull request against dmd, the canonical compiler for the ‘D’ programming languageto add support for our SOURCE_DATE_EPOCH environment variable as well the other C preprocessor tokens such __DATE__, __TIME__ and __TIMESTAMP__ which was subsequently merged. SOURCE_DATE_EPOCH defines a distribution-agnostic standard for build toolchains to consume and emit timestamps in situations where they are deemed to be necessary. []

The Telegram instant-messaging platform announced that they had updated to version 5.1.1 continuing their claim that they are reproducible according to their full instructions and therefore verifying that its original source code is exactly the same code that is used to build the versions available on the Apple App Store and Google Play distribution platforms respectfully.

Lastly, Hervé Boutemy reported that 97% of the current development versions of various Maven packages appear to have a reproducible build. []


Distribution work

In Debian this month, 89 reviews of Debian packages were added, 21 were updated and 33 were removed this month adding to our knowledge about identified issues. Many issue types were noticed, categorised and updated by Chris Lamb, including:

In addition, Holger Levsen filed a feature request against debrebuild, a tool for rebuilding a Debian package given a .buildinfo file, proposing to add --standalone or --one-shot-mode functionality.


In openSUSE, Bernhard M. Wiedemann made the following changes:

In Arch Linux, a rebuilder instance has been setup at reproducible.archlinux.org that is rebuilding Arch’s [core] repository directly. The first rebuild has led to approximately 90% packages reproducible contrasting with 94% on the Reproducible Build’s project own ArchLinux status page on tests.reproducible-builds.org that continiously builds packages and does not verify Arch Linux packages. More information may be found on the corresponding wiki page and the underlying decisions were explained on our mailing list.


Software development

diffoscope

Chris Lamb made the following changes to diffoscope, the Reproducible Builds project’s in-depth and content-aware diff utility that can locate and diagnose reproducibility issues (including preparing and uploading versions 139, 140, 141, 142 and 143 to Debian which were subsequently uploaded to the backports repository):

  • Comparison improvements:

    • Dalvik .dex files can also serve as APK containers so restrict the narrower identification of .dex files to files ending with this extension and widen the identification of APK files to when file(1) discovers a Dalvik file. (#28)
    • Add support for Hierarchical Data Format (HD5) files. (#95)
    • Add support for .p7c and .p7b certificates. (#94)
    • Strip paths from the output of zipinfo(1) warnings. (#97)
    • Don’t uselessly include the JSON “similarity” percentage if it is “0.0%”. []
    • Render multi-line difference comments in a way to show indentation. (#101)
  • Testsuite improvements:

    • Add pdftotext as a requirement to run the PDF test_metadata text. (#99)
    • apktool 2.5.0 changed the handling of output of XML schemas so update and restrict the corresponding test to match. (#96)
    • Explicitly list python3-h5py in debian/tests/control.in to ensure that we have this module installed during a test run to generate the fixtures in these tests. []
    • Correct parsing of ./setup.py test --pytest-args arguments. []
  • Misc:

    • Capitalise “Ordering differences only” in text comparison comments. []
    • Improve documentation of FILE_TYPE_HEADER_PREFIX and FALLBACK_FILE_TYPE_HEADER_PREFIX to highlight that only the first 16 bytes are used. []

Michael Osipov created a well-researched merge request to return diffoscope to using zipinfo directly instead of piping input via /dev/stdin in order to ensure portability to the BSD operating system []. In addition, Ben Hutchings documented how --exclude arguments are matched against filenames [] and Jelle van der Waa updated the LLVM test fixture difference for LLVM version 10 [] as well as adding a reference to the name of the h5dump tool in Arch Linux [].

Lastly, Mattia Rizzolo also fixed in incorrect build dependency [] and Vagrant Cascadian enabled diffoscope to locate the openssl and h5dump packages on GNU Guix [][], and updated diffoscope in GNU Guix to version 141 [] and 143 [].

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In April, Chris Lamb made the following changes:

  • Add deprecation plans to all handlers documenting how — or if — they could be disabled and eventually removed, etc. (#3)
  • Normalise *.sym files as Java archives. (#15)
  • Add support for custom .zip filename filtering and exclude two patterns of files generated by Maven projects in “fork” mode. (#13)

disorderfs

disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues.

This month, Chris Lamb fixed a long-standing issue by not drop UNIX groups in FUSE multi-user mode when we are not root (#1) and uploaded version 0.5.9-1 to Debian unstable. Vagrant Cascadian subsequently refreshed disorderfs in GNU Guix to version 0.5.9 [].

Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

In addition, Bernhard informed the following projects that their packages are not reproducible:

  • acoular (report unknown non-determinism)
  • cri-o (report a date issue)
  • gnutls (report certtool being unable to extend certificates beyond 2049)
  • gnutls (report copyright year variation)
  • libxslt (report a bug about non-deterministic output from data corruption)
  • python-astropy (report a future build failure in 2021)

Project documentation

This month, Chris Lamb made a large number of changes to our website and documentation in the following categories:

  • Community engagement improvements:

    • Update instructions to register for Salsa on our Contribute page now that the signup process has been overhauled. []
    • Make it clearer that joining the rb-general mailing list is probably a first step for contributors to take. []
    • Make our full contact information easier to find in the footer (#19) and improve text layout using bullets to separate sections [].
  • Accessibility:

    • To improve accessibility, make all links underlined. (#12)
    • Use an enhanced foreground/background contrast ratio of 7.04:1. (#11)
  • General improvements:

  • Internals:

    • Move to using jekyll-redirect-from over manual redirect pages [][] and add a redirect from /docs/buildinfo/ to /docs/recording/. (#23)
    • Limit the website self-check to not scan generated files [] and remove the “old layout” checker now that I have migrated all them [].
    • Move the news archive under the /news/ namespace [] and improve formatting of archived news links [].
    • Various improvements to the draft template generation. [][][][]

In addition, Holger Levsen clarified exactly which month we ceased to do weekly reports [] and Mattia Rizzolo adjusted the title style of an event page [].

Marcus Hoffman also started a discussion on our website’s issue tracker asking for clarification on embedded signatures and Chris Lamb subsequently replied and asked Marcus to go ahead and propose a concrete change.

Testing framework

We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced.

  • Chris Lamb:

    • Print the build environment prior to executing a build. []
    • Drop a misleading disorderfs-debug prefix in log output when we change non-disorderfs things in the file and, as it happens, do not run disorderfs at all. []
    • The CSS for the package report pages added a margin to all <a> HTML elements under <li> ones, which was causing a comma/bullet spacing issue. []
    • Tidy the copy in the project links sidebar. []
  • Holger Levsen:

    • General:
    • Debian:

      • Reduce scheduling frequency of the buster distribution on the arm64 architecture, etc.. [][]
      • Show builds per day on a per-architecture basis for the last year on the Debian dashboard. []
      • Drop the Subgraph OS package set as development halted in 2017 or 2018. []
      • Update debrebuild to version from the latest version of devscripts. [][]
      • Add or improve various parts of the documentation. [][][]
    • Work on a Debian rebuilder:

      • Integrate sbuild. [][][][][]
      • Select a random .buildinfo file and attempt to build and compare the result. [][][][]
      • Improve output and related output formatting. [][][][][]
      • Outline next steps for the development of the tool. [][][]
      • Various refactoring and code improvements. [][][]

Lastly, Mattia Rizzolo fixed some log parsing code regarding potentially-harmless warnings from package installation [][] and the usual build node maintenance was performed by Holger Levsen [][][] and Mattia Rizzolo [][][].


Misc news

On our mailing list this month, Santiago Torres asked whether we were still publishing releases of our tools to our website and Chris Lamb replied that this was not the case and fixed the issue. Later in the month Santiago also reported that the signature for the disorderfs package did not pass its GPG verification which was also fixed by Chris Lamb.

Hans-Christoph Steiner of the Guardian Project asked whether there would be interest in making our website translatable which resulted in a WIP merge request being filed against the website and a discussion on how to track translation updates.


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:


This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Daniel Shahaf, Holger Levsen, Jelle van der Waa, kpcyrd, Mattia Rizzolo and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.




View all our monthly reports