Collaborative Working Sessions - Source tarball reproducibility

== Reproducibility of source tarballs

Problem: version-controlled sources are not used directly for builds.

In most cases, a tarball is used. It can be “raw” (e.g. output of git archive) or “processed” (e.g. autotools’ make dist or Python’s sdist tarballs). Processed tarballs are in particular an avenue for attack (c.f. xz).

One idea: encourage developers to stop using “processed tarballs”, just use a raw tarballs, a la github tarball download.

Another idea: “Having an archive in the middle that is not verified by anything is already a problem”:

Another idea: “sdist should be build from one repo”

Many projects use github actions or equivalent to generate sdist, so generation is programatic. This could be checked in “source rebuilderd”.

Two competing approaches:

  1. only allow git archive from the source repo
  2. allow any build process for sdist, as long as it is reproducible

(Some debian packages use “source git”, i.e. a branch in the upstream repo to build from, which also satisfies the requirement.)

Historical data for reproducibility is missing from distros, except for Debian.

=== Rebuilderd for sources

action item Set up rebuilderd for sources.

Would it be possible to extend rebuilderd to do test “sdist creation”? Yes: a “distribution” where the output is sdist.

Sometime the version tag used in distros doesn’t match upstream release tag. repology.org could be used to get source repo url.

We don’t care about all tarballs. Only about those tarballs which were used in distro builds.

example of source hash information for distributions:

=== See also

https://whatsrc.org/

A blog post breaking down the xz package in Debian which gives insight into Debian packaging and upstream sources: https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/