Most archive formats record metadata that will capture details about the build environment if no care is taken. File last modification time is obvious, but file ordering, users, groups, numeric ids, and permissions can also be of concern. Tar will be used as the main example but these tips apply to other archive formats as well.
File modification times
Most archive formats will, by default, record file last modification times, while some will also record file creation times.
Tar has a way to specify the modification time that is used for all archive members:
Z is used to specify that time is in the UTC
For other archive formats, it is always possible to use
touch to reset
the modification times to a predefined value
before creating the archive:
In some cases, it is preferable to keep the original times for files that have not been created or modified during the build process:
A patch has been written to simplify the latter operation with GNU
Tar. It is currently available in Debian since
tar version 1.28-1. Hopefully it
will be integrated upstream soon, but you might want to use it with
caution. It adds a new
--clamp-mtime flag which will only set the time
when the file is more recent than the value specified with
This has the benefit of leaving the original file modification time untouched.
When asked to record directories, most archive formats will read their content in the order returned by the filesystem which is likely to be different on every run.
With version 1.28, GNU Tar has gained the
--sort=name option which will
sort filenames in a locale independent manner:
For older versions or other archive formats, it is possible to use
sort to achieve the same effect:
Care must be taken to ensure that
sort is called in the context of the
C locale to avoid any surprises related to collation order.
Users, groups and numeric ids
Depending on the archive format, the user and group owning the file can be recorded. Sometimes it will be using a string, sometimes using the associated numeric ids.
When files belong to predefined system groups, this is not a problem, but builds are often performed with regular users. Recording of the account name or its associated ids might be a source of reproducibility issues.
Tar offers a way to specify the user and group owning the file. Using
--numeric-owner is a safe bet, as it will effectively
record 0 as values:
GNU tar defaults to the pax format and if
POSIXLY_CORRECT is set, that adds files’ ctime, atime and the PID of the tar process as non-deterministic metadata.
To avoid this, either
unset POSIXLY_CORRECT (only works with tar>1.32) or add to the tar call
--format=gnu (both only available in GNU tar)
--format=ustar if the limitations in that format are not a problem.
The recommended way to create a Tar archive is thus:
If tools do not support options to create reproducible archives, it is always possible to perform post-processing.
Static libraries (
.a) on Unix-like systems are ar archives. Like
other archive formats, they contain metadata, namely timestamps, UIDs,
GIDs, and permissions. None are actually required for using them as
ar and other tools from
binutils have a deterministic
mode which will use zero for UIDs, GIDs, timestamps, and use consistent
file modes for all files. It can be made the default by passing the
--enable-deterministic-archives option to
./configure. It is already
enabled by default for some distributions1 and so
far it seems to be pretty safe except for
Makefiles using targets like
When binutils is not built with deterministic archives by default, build
systems have to be changed to pass the right options to
ARFLAGS can be set to
Dcvr with many build systems to turn on the
deterministic mode. Care must also be taken to pass
used to create the function index.
Another option is post-processing with
objcopy --enable-deterministic-archives libfoo.a
The above does not fix file ordering.
cpio archives are commonly used for initramfs images. The cpio header
man 5 cpio) can contain device and inode numbers, which whilst
deterministic, can vary from system to system.
One way to filter these is by piping through bsdtar.
Example of non-deterministic code:
echo ucode.bin | bsdcpio -o -H newc -R 0:0 > ucode.img
Example of deterministic code:
echo ucode.bin | bsdtar --uid 0 --gid 0 -cnf - -T - | bsdtar --null -cf - --format=newc @- > ucode.img
Note that other issues such as timestamps may still require rectification prior to archival.
GNU Libtool prior to
included in version 2.2.7b) did not sort the find output. It appears that many
packages are bootstrapped with a version prior to this.
Confusingly, although GNU GCC’s
ltmain.sh claims to
have been generated by libtool 2.2.7a, GNU GCC actually maintains their own
ltmain.sh, which fixed this issue independently
d41cd173e23. This aforementioned change was first included in version
9.1.0, meaning that the reproducibility issue remains in GCC versions below
Achieve deterministic builds
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Value initialization
- Version information
- Archive metadata
- Stable order for outputs
- Build path
- System images
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems
Distribute the environment
Follow us on Twitter @ReproBuilds, Mastodon @email@example.com & Reddit and please consider making a donation. • Content licensed under CC BY-SA 4.0, style licensed under MIT. Templates and styles based on the Tor Styleguide. Logos and trademarks belong to their respective owners. • Patches welcome via our Git repository (instructions) or via our mailing list. • Full contact info