Stripping of unreproducible information
In addition to handling timestamps, another crucial aspect of achieving reproducible builds is the removal of “useless” and unreproducible information from the build artifacts. This information often includes metadata, such as file ownership or access times, which can vary depending on the build environment or the specific conditions under which the build occurs. If left unaddressed, these variations can lead to inconsistencies in the final output, making the build non-reproducible.
Metadata are best avoided
Metadata like file ownership, permissions, or even unimportant data stored by some formats can introduce variability.
For instance, many build tools or file formats capture the user ID or group ID of the person running the build, which can lead to different outputs even when the build process is otherwise identical. Stripping or standardizing this metadata is essential to ensure that the build outputs are consistent regardless of the environment.
External tools
To tackle this issue, the strip-nondeterminism tool was created. It automatically removes or normalizes non-deterministic information in various types of files, such as archives, PDFs, and JAR files. It can clamp timestamps, strip unnecessary metadata, and perform other normalizations that ensure the build outputs remain identical across different environments.
This tool is particularly useful when it’s not feasible to modify the build process itself to eliminate the source of non-determinism. By incorporating strip-nondeterminism into your build pipeline, you can address many reproducibility issues at a post-processing stage, further ensuring that your builds are truly reproducible.
For Android APKs, the reproducible-apk-tools project provides similar functionality, helping to ensure that APK files can be reproduced consistently by removing or normalizing non-deterministic data specific to APKs.
By diligently stripping away unreproducible information and using tools like strip-nondeterminism, you can significantly improve the reproducibility of your builds, ensuring that the same source code will always produce identical results, regardless of where or when it is built.
Introduction
Achieve deterministic builds
- Commandments of reproducible builds
- Variations in the build environment
- SOURCE_DATE_EPOCH
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Stripping of unreproducible information
- Value initialization
- Version information
- Timestamps
- Timezones
- Locales
- Archive metadata
- Stable order for outputs
- Randomness
- Build path
- System images
- JVM
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems