Collaborative Working Sessions - Source tarball reproducibility
== Reproducibility of source tarballs
Problem: version-controlled sources are not used directly for builds.
In most cases, a tarball is used. It can be “raw” (e.g. output of git archive) or “processed” (e.g. autotools’ make dist or Python’s sdist tarballs). Processed tarballs are in particular an avenue for attack (c.f. xz).
One idea: encourage developers to stop using “processed tarballs”, just use a raw tarballs, a la github tarball download.
- historically, the distribution tarball was used to reduce the tools required to be installed on the machine of the end user building the software. Not relevant anymore.
- autotools made it easy to run
./configurefor out-of-tree builds, but out-of-tree builds were not the initial motivation for distribution tarballs.
Another idea: “Having an archive in the middle that is not verified by anything is already a problem”:
- assumption that tarballs match VCS is not true in general
- OTOH, tarballs have use as a standard and efficient way to deliver sources
Another idea: “sdist should be build from one repo”
- Having something that is built from multiple independent parts makes it hard to rebuild.
Many projects use github actions or equivalent to generate sdist, so generation is programatic. This could be checked in “source rebuilderd”.
Two competing approaches:
- only allow
git archivefrom the source repo - allow any build process for sdist, as long as it is reproducible
(Some debian packages use “source git”, i.e. a branch in the upstream repo to build from, which also satisfies the requirement.)
- Github’s autogenerated tarballs are not promised to be stable, but are stable in practice.
- Github “release tarballs” attached to releases are generated, so repeat the original issue.
- In principle the operator of the website could inject things into the generated tarball. Verifying the hash of the tarball is a way to fix this.
Historical data for reproducibility is missing from distros, except for Debian.
=== Rebuilderd for sources
action item Set up rebuilderd for sources.
Would it be possible to extend rebuilderd to do test “sdist creation”? Yes: a “distribution” where the output is sdist.
Sometime the version tag used in distros doesn’t match upstream release tag. repology.org could be used to get source repo url.
We don’t care about all tarballs. Only about those tarballs which were used in distro builds.
example of source hash information for distributions:
- https://gitlab.archlinux.org/archlinux/packaging/packages/archlinux-keyring/-/blob/main/.SRCINFO?ref_type=heads
- https://src.fedoraproject.org/rpms/archlinux-keyring/blob/rawhide/f/sources
- https://salsa.debian.org/michel/archlinux-keyring (“source git”)
=== See also
https://whatsrc.org/
A blog post breaking down the xz package in Debian which gives insight into Debian packaging and upstream sources: https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/