The idea of reproducible builds is to empower anyone to verify that no flaws have been introduced during the build process by reproducing byte-for-byte identical binary packages from a given source.
Achieving reproducible builds requires cooperation from multiple roles involved in software production. On small projects, all these roles might be carried by a single person, but it helps to differentiate the responsibilities.
Getting a deterministic build system
In order to allow software to build reproducibly, the source code must not introduce uncontrollable variations in the build output.
Things will work better if such variations are discovered before users are confronted with unreproducible binaries. Setting up a test protocol in which rebuilds are performed under variations in the environment (aspects like time, username, CPU, system version, filesystems, amongst many others) will greatly help. A recent empirical study identifies 16 variations in the environment, as demonstrated in the taxonomy below.
Defining a build environment
As different versions of compilation tools are likely to produce different outputs, users must be able to recreate a build environment close enough to the original build. It is not required that the toolchain1 itself is byte-for-byte identical, but its output has to stay the same.
The build environment can either be defined while the software is being developed or it can be recorded at build time.
Distributing the build environment
Users need to be able to know what build environment needs to be set up to rebuild the software.
If the build environment is defined ahead and part of the source code, then no further steps are required.
In other cases, it needs to be made available alongside the binaries. The ideal form is a description that can be understood by both humans and machines to make automatic verification possible, while enabling people to review that the environment is sane.
Providing a comparison protocol
Users must have an easy way to recreate the build environment, get the source code, perform the build, and compare the results.
Ideally, the comparison protocol to verify that resulting binaries are identical should be simple. Comparing bytes or cryptographic hash values is easy to do and understand.
Other technologies might require removing cryptographic signatures or ignore specific parts. Such operations must be both documented and scripted. The rationale and code must be easy to understand by reviewers.
By toolchain, we mean all pieces of software needed to create the build output. ↩
Achieve deterministic builds
- Variations in the build environment
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Value initialization
- Version information
- Archive metadata
- Stable order for outputs
- Build path
- System images
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems
Distribute the environment
Follow us on Twitter @ReproBuilds, Mastodon @firstname.lastname@example.org & Reddit and please consider making a donation. • Content licensed under CC BY-SA 4.0, style licensed under MIT. Templates and styles based on the Tor Styleguide. Logos and trademarks belong to their respective owners. • Patches for this website welcome via our Git repository (instructions) or via our mailing list. • Full contact info