A software cannot easily be built reproducibly if the source varies depending on factors that are hard or impossible to control like the ordering of files on a filesystem or the current time.
Drawing the line
Which aspect of the build system needs to be made deterministic is deeply linked to what is defined as part of the build environment.
For example, we assume that different versions of a compiler will
produce different output and so usage of a specific
compiler version is mandated as part of the build environment. The same
assumption does not necessarily hold for more simple tools like
sed where the requirement for the environment can be as loose as
“any recent Unix-like system”.
But it’s hardly a good idea to mandate that the system pseudo-random number generator be initialized with a given value before performing a build, so better not having randomness affect a build output.
Another concrete example on where to draw the line: there is no need to care about making the build system give constant output when run in different build paths when the build path is considered part of the build environment, and thus requiring rebuilds to be performed in the same directory as the original build.
In a nutshell
The basics on how to make a build system deterministic can be summarized as:
- Ensure stable inputs.
- Ensure stable outputs.
- Capture as little as possible from the environment.
What follows are some advices on common issues that can affect source code or build systems that make multiple builds from the exact same source different.
The default configuration of CMake makes the build directory part of the build environment. Here are some known issues and recommendations:
- CMake sets a
RPATHfor binaries that link to a library in the the same project. Even when this is stripped at installation time, the build-id section will be different. Possible workarounds:
- Users can set
CMAKE_SKIP_RPATH=ONto disable the use of RPATH. Disadvantage: programs from the build directory cannot be run without setting
- Projects can set
CMAKE_BUILD_WITH_INSTALL_RPATH=ONto ensure a deterministic RPATH. Disadvantage: programs from the build directory cannot be run without setting
CMAKE_BUILD_RPATH_USE_ORIGIN=ONto enable the use of relative directories in RPATH (requires CMake 3.14). This is an appropriate option for both upstream projects and downstream distributions.
- Users can set
- Qt projects can use rcc to embed
resources such as translations and images. Since Qt 5.8, rcc includes
the file modification time of source files in the build output.
This is especially problematic for translation files that are
generated at build time. Possible workarounds:
- (Since Qt 5.9) If a project does not rely on an accurate
AUTORCCis enabled, this can be done by setting
--format-version;1. Upstream projects are encouraged to do this after checking that Qt 5.9 or newer is in use.
- (Since Qt 5.11) Set the
QT_RCC_SOURCE_DATE_OVERRIDEenvironment variable which behaves similar to
- (Since Qt 5.13) Set the
- Ensure that generated source files are touched with a fixed timestamp before rcc is called. See also https://bugs.debian.org/894476.
- (Since Qt 5.9) If a project does not rely on an accurate QFileInfo::lastModified, pass
- Qt projects that use
Q_OBJECTmacros require moc to generate additional C++ files. CMake will automatically do this when
AUTOMOCis enabled, but then the relative path from the build directory to the source directory will become part of the build environment. For example, if the build directory is
/tmp/buildand the source file is at
/tmp/foo/widget.h, then the generated file will include
../[...]/../foo/widget.h. Possible workarounds:
- Use the
-poption to override the include prefix. This requires the prefix plus the header filename to be available from the include path. See also https://gitlab.kitware.com/cmake/cmake/issues/18815.
- Ensure that the build directory and source directory remains fixed
across builds. For example, if users always create a
builddirectory in the source tree, then reproducibility won’t be affected.
- Use the
Not all problems have solutions currently. Some tools that might be used in a build process might require fixes to become non-deterministic. The Debian effort keeps a list of all issues found while investigating reproducibility problems in its 22,000+ source packages. While some require changes in the package source itself, some can be fixed by improving or fixing the tools used to perform the builds.