Formal definition
Most free software distributions are self-contained: all tools required to build their components are part of the distribution. In such cases, it’s possible to specify the build environment in a machine readable format that can be later used to reinstall the environment.
It is also useful for the relationship of these parts to be discoverable. This currently only works within each software distributions. Even across distributions that use the same formats (e.g. Debian and Ubuntu) it does not yet work out of the box. But builds often use other build artefacts as source inputs.
E.g. a container image was created by installing Debian packages, those Packages often use tar archives, build from some git source repositiry, and some git repos contain code developers generated with a build step. So to recursively verify a specific artefact reproduces one needs to indentify all the sources, mirror them and run some build jobs, across distributions. For container images it is currently not specified how to look up the source from an artefact or embed a identifier for the source in the artifact. And for many artefact types looking up the source control revision is guess work instead of a specification that can be followed, much less it being verified when a change is submitted to a distribution.
Files that serve this goal are sometimes called a software bill of materials (SBOM). Various formats are in use or were proposed for this:
- Debian Buildinfo specification, (example Debian Buildinfo file)
- openSUSE OBS Buildinfo documentation with example
- slsa provenance specification, (example of slsa povenance file produced by openSUSE OBS)
- in-toto Attestation specification can include multiple formats that are related
As example, the .buildinfo control files used by Debian specifies the sources, the generated binaries, and all packages used to perform this build (with the exact version number). This is signed by reproducers to attest what their build result was. (This alone does not indicate if it is reproducible, as every repeated build result could be different.)
A Debian binary package containes the name and version of the source used. A Debian package repository contains a file with the hashes, names and versions of included packages. Each update of the Debian repositories is archived to be able to reproduce package builds.
This ensures that from the artifact itself, or the hash of an artifact, or the repository identifier and name and version of a binary, or the Buildinfo file one can find the exact source code and and build environment that was used for building it.
Distributions that are built in a rolling way are more complicated to reproduce, than a distribution that is built only with the package versions that it itself contains. For the later the build dependencies can just be installed from itself. That it is self contained (though with self-referential dependency loops) and reproducible can be ensured by rebuilding again after bootsrapping from outside binaries (and only keeping the rebuilt binaries). But for rolling distributions the dependencies are in some past version of it, but not necessarly in the current one. So to install the build dependencies one needs to either search an archive of them for which repo version included all their versions or install the neceassry binaries directly or assemble a new package repo from just these versions taken from the archive.
Introduction
Achieve deterministic builds
- Commandments of reproducible builds
- Variations in the build environment
- SOURCE_DATE_EPOCH
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Stripping of unreproducible information
- Value initialization
- Version information
- Timestamps
- Timezones
- Locales
- Archive metadata
- Stable order for outputs
- Randomness
- Build path
- System images
- JVM
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems
Distribute the environment
- Building from source
- Virtual machine drivers
- Formal definition