Reproducible Builds in February 2022

View all our monthly reports


Welcome to the February 2022 report from the Reproducible Builds project. In these reports, we try to round-up the important things we and others have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.


Jiawen Xiong, Yong Shi, Boyuan Chen, Filipe R. Cogo and Zhen Ming Jiang have published a new paper titled Towards Build Verifiability for Java-based Systems (PDF). The abstract of the paper contains the following:

Various efforts towards build verifiability have been made to C/C++-based systems, yet the techniques for Java-based systems are not systematic and are often specific to a particular build tool (eg. Maven). In this study, we present a systematic approach towards build verifiability on Java-based systems.


GitBOM is a flexible scheme to track the source code used to generate build artifacts via Git-like unique identifiers. Although the project has been active for a while, the community around GitBOM has now started running weekly community meetings.


The paper Chris Lamb and Stefano Zacchiroli is now available in the March/April 2022 issue of IEEE Software. Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains (PDF), the abstract of the paper contains the following:

We first define the problem, and then provide insight into the challenges of making real-world software build in a “reproducible” manner-this is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).


In openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.


On our mailing list this month, Thomas Schmitt started a thread around the SOURCE_DATE_EPOCH specification related to formats that cannot help embedding potentially timezone-specific timestamp. (Full thread index.)


The Yocto Project is pleased to report that it’s core metadata (OpenEmbedded-Core) is now reproducible for all recipes (100% coverage) after issues with newer languages such as Golang were resolved. This was announced in their recent Year in Review publication. It is of particular interest for security updates so that systems can have specific components updated but reducing the risk of other unintended changes and making the sections of the system changing very clear for audit.

The project is now also making heavy use of “equivalence” of build output to determine whether further items in builds need to be rebuilt or whether cached previously built items can be used. As mentioned in the article above, there are now public servers sharing this equivalence information. Reproducibility is key in making this possible and effective to reduce build times/costs/resource usage.


diffoscope

diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 203, 204, 205 and 206 to Debian unstable, as well as made the following changes to the code itself:

  • Bug fixes:

    • Fix a file(1)-related regression where Debian .changes files that contained non-ASCII text were not identified as such, therefore resulting in seemingly arbitrary packages not actually comparing the nested files themselves. The non-ASCII parts were typically in the Maintainer or in the changelog text. [][]
    • Fix a regression when comparing directories against non-directories. [][]
    • If we fail to scan using binwalk, return False from BinwalkFile.recognizes. []
    • If we fail to import binwalk, don’t report that we are missing the Python rpm module! []
  • Testsuite improvements:

    • Add a test for recent file(1) issue regarding .changes files. []
    • Use our assert_diff utility where we can within the test_directory.py set of tests. []
    • Don’t run our binwalk-related tests as root or fakeroot. The latest version of binwalk has some new security protection against this. []
  • Codebase improvements:

    • Drop the _PATH suffix from module-level globals that are not paths. []
    • Tidy some control flow in Difference._reverse_self. []
    • Don’t print a warning to the console regarding NT_GNU_BUILD_ID changes. []

In addition, Mattia Rizzolo updated the Debian packaging to ensure that diffoscope and diffoscope-minimal packages have the same version. []


Vagrant Cascadian wrote to the debian-devel mailing list after noticing that the binutils source package contained unreproducible logs in one of its binary packages. Vagrant expanded the discussion to one about all kinds of build metadata in packages and outlines a number of potential solutions that support reproducible builds and arbitrary metadata.

Vagrant also started a discussion on debian-devel after identifying a large number of packages that embed build paths via RPATH when building with CMake, including a list of packages (grouped by Debian maintainer) affected by this issue. Maintainers were requested to check whether their package still builds correctly when passing the -DCMAKE_BUILD_RPATH_USE_ORIGIN=ON directive.

On our mailing list this month, kpcyrd announced the release of rebuilderd-debian-buildinfo-crawler a tool to parse the Packages.xz Debian package index file, attempts to discover the right .buildinfo file from buildinfos.debian.net and outputs it in a format that can be understood by rebuilderd. The tool, which is available on GitHub, solves a problem regarding correlating Debian version numbers with their builds.

bauen1 provided two patches for debian-cd, the software used to make Debian installer images. This involved passing --invariant and -i deb00001 to mkfs.msdos(8) and avoided embedding timestamps into the gzipped Packages and Translations files. After some discussion, the patches in question were merged and will be included in debian-cd version 3.1.36.

Roland Clobus wrote another in-depth status update about status of ‘live’ Debian images, summarising the current situation that “all major desktops build reproducibly with bullseye, bookworm and sid”.

The python3.10 package was uploaded to Debian by doko, fixing an issue where [.pyc files were not reproducible because the elements in frozenset data structures were not ordered reproducibly. This meant that to creating a bit-for-bit reproducible Debian chroot which included .pyc files was not reproducible. As of writing, the only remaining unreproducible parts of a standard chroot is man-db, but Guillem Jover has a patch for update-alternatives which will likely be part of the next release of dpkg.

Elsewhere in Debian, 139 reviews of Debian packages were added, 29 were updated and 17 were removed this month adding to our knowledge about identified issues. A large number of issue types have been updated too, including the addition of captures_kernel_variant, erlang_escript_file, captures_build_path_in_r_rdb_rds_databases, captures_build_path_in_vo_files_generated_by_coq and build_path_in_vo_files_generated_by_coq.


Website updates

There were quite a few changes to the Reproducible Builds website and documentation this month as well, including:

  • Chris Lamb:

  • Daniel Shahaf:

    • Try a different Markdown footnote content syntax to work around a rendering issue. [][][]
  • Holger Levsen:

    • Make a huge number of changes to the Who is involved? page, including pre-populating a large number of contributors who cannot be identified from the metadata of the website itself. [][][][][]
    • Improve linking to sponsors in sidebar navigation. []
    • drop sponsors paragraph as the navigation is clearer now. []
    • Add Mullvad VPN as a bronze-level sponsor . [][]
  • Vagrant Cascadian:


Upstream patches

The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. February’s patches included the following:


Testing framework

The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:

  • Daniel Golle:

    • Update the OpenWrt configuration to not depend on the host LLVM, adding lines to the .config seed to build LLVM for eBPF from source. []
    • Preserve more OpenWrt-related build artifacts. []
  • Holger Levsen:

  • Temporary use a different Git tree when building OpenWrt as our tests had been broken since September 2020. This was reverted after the patch in question was accepted by Paul Spooren into the canonical openwrt.git repository the next day.
    • Various improvements to debugging OpenWrt reproducibility. [][][][][]
    • Ignore useradd warnings when building packages. []
    • Update the script to powercycle armhf architecture nodes to add a hint to where nodes named virt-*. []
    • Update the node health check to also fix failed logrotate and man-db services. []
  • Mattia Rizzolo:

    • Update the website job after contributors.sh script was rewritten in Python. []
    • Make sure to set the DIFFOSCOPE environment variable when available. []
  • Vagrant Cascadian:

    • Various updates to the diffoscope timeouts. [][][]

Node maintenance was also performed by Holger Levsen [] and Vagrant Cascadian [].


Finally…

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:




View all our monthly reports