Welcome to the February 2022 report from the Reproducible Builds project. In these reports, we try to round-up the important things we and others have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Jiawen Xiong, Yong Shi, Boyuan Chen, Filipe R. Cogo and Zhen Ming Jiang have published a new paper titled Towards Build Verifiability for Java-based Systems (PDF). The abstract of the paper contains the following:
Various efforts towards build verifiability have been made to C/C++-based systems, yet the techniques for Java-based systems are not systematic and are often specific to a particular build tool (eg. Maven). In this study, we present a systematic approach towards build verifiability on Java-based systems.
GitBOM is a flexible scheme to track the source code used to generate build artifacts via Git-like unique identifiers. Although the project has been active for a while, the community around GitBOM has now started running weekly community meetings.
The paper Chris Lamb and Stefano Zacchiroli is now available in the March/April 2022 issue of IEEE Software. Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains (PDF), the abstract of the paper contains the following:
We first define the problem, and then provide insight into the challenges of making real-world software build in a “reproducible” manner-this is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).
In openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
On our mailing list this month, Thomas Schmitt started a thread around the SOURCE_DATE_EPOCH
specification related to formats that cannot help embedding potentially timezone-specific timestamp. (Full thread index.)
The Yocto Project is pleased to report that it’s core metadata (OpenEmbedded-Core) is now reproducible for all recipes (100% coverage) after issues with newer languages such as Golang were resolved. This was announced in their recent Year in Review publication. It is of particular interest for security updates so that systems can have specific components updated but reducing the risk of other unintended changes and making the sections of the system changing very clear for audit.
The project is now also making heavy use of “equivalence” of build output to determine whether further items in builds need to be rebuilt or whether cached previously built items can be used. As mentioned in the article above, there are now public servers sharing this equivalence information. Reproducibility is key in making this possible and effective to reduce build times/costs/resource usage.
diffoscope
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 203
, 204
, 205
and 206
to Debian unstable, as well as made the following changes to the code itself:
-
Bug fixes:
- Fix a
file(1)
-related regression where Debian.changes
files that contained non-ASCII text were not identified as such, therefore resulting in seemingly arbitrary packages not actually comparing the nested files themselves. The non-ASCII parts were typically in theMaintainer
or in the changelog text. […][…] - Fix a regression when comparing directories against non-directories. […][…]
- If we fail to scan using
binwalk
, returnFalse
fromBinwalkFile.recognizes
. […] - If we fail to import
binwalk
, don’t report that we are missing the Pythonrpm
module! […]
- Fix a
-
Testsuite improvements:
-
Codebase improvements:
In addition, Mattia Rizzolo updated the Debian packaging to ensure that diffoscope
and diffoscope-minimal
packages have the same version. […]
Debian-related updates
Vagrant Cascadian wrote to the debian-devel
mailing list after noticing that the binutils
source package contained unreproducible logs in one of its binary packages. Vagrant expanded the discussion to one about all kinds of build metadata in packages and outlines a number of potential solutions that support reproducible builds and arbitrary metadata.
Vagrant also started a discussion on debian-devel
after identifying a large number of packages that embed build paths via RPATH when building with CMake, including a list of packages (grouped by Debian maintainer) affected by this issue. Maintainers were requested to check whether their package still builds correctly when passing the -DCMAKE_BUILD_RPATH_USE_ORIGIN=ON
directive.
On our mailing list this month, kpcyrd announced the release of rebuilderd-debian-buildinfo-crawler a tool to parse the Packages.xz
Debian package index file, attempts to discover the right .buildinfo
file from buildinfos.debian.net and outputs it in a format that can be understood by rebuilderd. The tool, which is available on GitHub, solves a problem regarding correlating Debian version numbers with their builds.
bauen1 provided two patches for debian-cd, the software used to make Debian installer images. This involved passing --invariant
and -i deb00001
to mkfs.msdos(8)
and avoided embedding timestamps into the gzipped Packages
and Translations
files. After some discussion, the patches in question were merged and will be included in debian-cd version 3.1.36.
Roland Clobus wrote another in-depth status update about status of ‘live’ Debian images, summarising the current situation that “all major desktops build reproducibly with bullseye, bookworm and sid”.
The python3.10
package was uploaded to Debian by doko, fixing an issue where [.pyc
files were not reproducible because the elements in frozenset
data structures were not ordered reproducibly. This meant that to creating a bit-for-bit reproducible Debian chroot which included .pyc
files was not reproducible. As of writing, the only remaining unreproducible parts of a standard
chroot is man-db
, but Guillem Jover has a patch for update-alternatives
which will likely be part of the next release of dpkg
.
Elsewhere in Debian, 139 reviews of Debian packages were added, 29 were updated and 17 were removed this month adding to our knowledge about identified issues. A large number of issue types have been updated too, including the addition of captures_kernel_variant
, erlang_escript_file
, captures_build_path_in_r_rdb_rds_databases
, captures_build_path_in_vo_files_generated_by_coq
and build_path_in_vo_files_generated_by_coq
.
Website updates
There were quite a few changes to the Reproducible Builds website and documentation this month as well, including:
-
Chris Lamb:
-
Daniel Shahaf:
-
Holger Levsen:
- Make a huge number of changes to the Who is involved? page, including pre-populating a large number of contributors who cannot be identified from the metadata of the website itself. […][…][…][…][…]
- Improve linking to sponsors in sidebar navigation. […]
- drop sponsors paragraph as the navigation is clearer now. […]
- Add Mullvad VPN as a bronze-level sponsor . […][…]
-
Vagrant Cascadian:
- Remove a stray parenthesis from the Who is involved? page. […]
Upstream patches
The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. February’s patches included the following:
-
Bernhard M. Wiedemann:
btop
(sort-related issue)complexity
(date)giac
(update the version with upstreamed date patch)htcondor
(use CMake timestamp)libint
(readdir
system call related)libnet
(date-related issue)librime-lua
(sort filesystem ordering)linux_logo
(sort-related issue)micro-editor
(date-related issue)openvas-smb
(date-related issue)ovmf
(sort-related issue)paperjam
(date-related issue)python-PyQRCode
(date-related issue)quimb
(single-CPU build failure)radare2
(Meson date/time-related issue)radare2
(ReworkSOURCE_DATE_EPOCH
usage to be portable)siproxd
(date, with Sebastian Kemper + follow-upxonsh
(Address Space Layout Randomisation-related issue)xsnow
(date &tar(1)
-related issue)zip
(toolchain issue related to filesystem ordering)
-
Chris Lamb:
- #1005029 filed against
ltsp
(forwarded upstream). - #1005197 filed against
pcmemtest
. - #1005825 filed against
hatchling
. - #1005826 filed against
mpl-sphinx-theme
(forwarded upstream) - #1005827 filed against
gap-hapcryst
. - #1005901 filed against
tree-puzzle
. - #1005954 filed against
jcabi-aspects
. - #1005955 filed against
paper-icon-theme
.
- #1005029 filed against
-
Roland Clobus:
-
Vagrant Cascadian:
- #1005408 filed against
wcwidth
. - #1005420 filed against
xir
. - #1005421 filed against
xir
. - #1005726 filed against
ruby-github-markup
. - #1005727 filed against
ruby-tioga
. - #1005792 filed against
btop
. - #1005793 filed against
libadwaita-1
. - #1005794 filed against
snibbetracker
. - #1006252 filed against
cctbx
. - #1006254 filed against
mdnsd
. - #1006256 filed against
gmerlin
. - #1006302 filed against
beav
. - #1006385 filed against
krita
. - #1006407 filed against
qt6-base
. - #1006455 filed against
onevpl-intel-gpu
. - #1006471 filed against
ruby3.0
. - #1006473 filed against
nix
. - #1006474 filed against
foma
. - #1006476 filed against
ruby3.0
.
- #1005408 filed against
Testing framework
The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
-
Daniel Golle:
-
Holger Levsen:
- Temporary use a different Git tree when building OpenWrt as our tests had been broken since September 2020. This was reverted after the patch in question was accepted by Paul Spooren into the canonical
openwrt.git
repository the next day.- Various improvements to debugging OpenWrt reproducibility. […][…][…][…][…]
- Ignore
useradd
warnings when building packages. […] - Update the script to powercycle
armhf
architecture nodes to add a hint to where nodes namedvirt-*
. […] - Update the node health check to also fix failed
logrotate
andman-db
services. […]
-
Mattia Rizzolo:
-
Vagrant Cascadian:
Node maintenance was also performed by Holger Levsen […] and Vagrant Cascadian […].
Finally…
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
rb-general@lists.reproducible-builds.org