Timestamps
Timestamps make the biggest source of reproducibility issues. Many build tools record the current date and time. The filesystem does, and most archive formats will happily record modification times on top of their own timestamps. It is also customary to record the date of the build in the software itself…
Timestamps are best avoided
Often the time of the build was used as an approximate way to know which version of the source has been built, and which tools had been used to do it. With reproducible builds, recording the time of the build becomes meaningless: on one side, the source code needs to be tracked more accurately than just a timestamp, and on the other side, the build environment needs to be defined or extensively recorded.
If a date is required to give users an idea on when the software was made, it is better to use a date that is relevant to the source code instead of the build: old software can always be built later. Like version information, it’s best to extract such a date from the revision control system or from a changelog.
External tools
Some tools used in build processes, like code or documentation generators, write timestamps which will create unreproducible build products.
The Reproducible Builds effort proposed the SOURCE_DATE_EPOCH
environment
variable to address the problem. Tools that support it1 will use its
value—a number of seconds since January 1st 1970, 00:00 UTC—instead of the
current date and time (when set). The variable has been formally
defined in the
hope of wider adoption.
(Originally SOURCE_DATE_EPOCH
was introduced by the Debian reproducible
builds folks and since has been adapted widely.)
Changes required to support SOURCE_DATE_EPOCH
are usually fairly small and
easy to write. Patches for tools which don’t yet support the environment
variable have been usually well received and help all users wanting
reproducible builds.
In case where that is not possible, an option is to do post-processing on the output. The idea is to either remove the timestamps entirely or to normalize them to a predetermined date and time. strip-nondeterminism was designed as an extensible program to perform such normalization on various file formats.
Another option is to run these tools using
libfaketime. This library is
loaded through the LD_PRELOAD
environment variable and it will intercept
function calls retrieving the current time of day. It will reply instead
with a predefined date and time. In some cases, it works just fine and can
solve problems without requiring many changes to a given build system. But
if any part of the build process is relying on time differences, things will
go wrong. One case of bad interaction between libfaketime
and parallel
compilation has been identified as a source of reproducibility issue in the
Tor Browser. So beware.
-
As of 2015-10-26, the following tools are known to support
SOURCE_DATE_EPOCH
: help2man, Sphinx. Also, others have been modified locally in Debian already to support this. ↩
Introduction
- Which problems do Reproducible Builds Solve?
- Definitions
- History
- Why reproducible builds?
- Making plans
- Academic publications
Achieve deterministic builds
- Commandments of reproducible builds
- Variations in the build environment
- SOURCE_DATE_EPOCH
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Stripping of unreproducible information
- Value initialization
- Version information
- Timestamps
- Timezones
- Locales
- Archive metadata
- Stable order for outputs
- Randomness
- Build path
- System images
- JVM
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems