Version information
Version information embedded in the software needs to be made deterministic. Counter-examples are using the current date or an incremental build counter.
The date and time of the build itself is hardly of value as an old source code can always be compiled long after it has been released. It’s best when version information gives a good indication of what source code has been built.
The version number can come from a dedicated source file, a changelog, or from a version control system. If a date is needed, it can be extracted from the changelog or the version control system.
Git checksums
Cryptographic checksums from revision control systems can be used to identify source content. Git commit IDs are thus a good candidate to include as as part of version information.
However, abbreviated Git hash identifies (such as those obtained via git
describe
or git rev-parse
) can be a source of non-reproducibility. This
is because the number of hexadecimal characters in the abbreviated hash is
dependent on the number of objects in the Git repository.
The number of objects will not only change over time (due to other commits,
even those not on the primary development branch), but it will also
dramatically change if a ‘shallow’ clone is made (see git-clone(1)
) –
these have fewer objects by design. To quote from the git-config(1)
manual
page:
core.abbrev
Set the length object names are abbreviated to. If unspecified or set to “
auto
”, an appropriate value is computed based on the approximate number of packed objects in your repository, which hopefully is enough for abbreviated object names to stay unique for some time. If set to “no
”, no abbreviation is made and the object names are shown in their full length. The minimum length is 4.
Therefore, it is recommended that a fixed (or “no
”) truncation is
specified when obtaining identifiers by using, for example, git describe
--abbrev=12
, git rev-parse --short=12 HEAD
or the core.abbrev config
setting.
Introduction
Achieve deterministic builds
- Commandments of reproducible builds
- Variations in the build environment
- SOURCE_DATE_EPOCH
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Value initialization
- Version information
- Timestamps
- Timezones
- Locales
- Archive metadata
- Stable order for outputs
- Randomness
- Build path
- System images
- JVM
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems