Controlling the build environment
45-minute + 10 minute session on day 1
Testing that a piece of software is reproducible means having the ability to control the environment to perform variations that could later be found in the wild. A project could also decide to making a piece of software reproducible by specifying precisely in which environment it should be build. Both approaches basically require the same set of tools, it’s just when that differs.
Non-exhaustive list of variations: username, hostname, build path, timezone, cpu, locale, filesystem ordering
Every project can approach differently:
- some projects want to set the same build environment for every developers
- other projects want different environments to produce same builds. In that case, need to test using different environments.
For Debian:
- sbuild/pbuilder: unpack a minimal chroot, install dependencies, and build
- script called prebuild, doing build twice with different environments to find sources of non-determinism
- srebuild script using the list of packages involved in the build initially recorded in
.buildinfo files
. We can use snapshot.debian.org to recreate environment. - on debian, different system has been setup 399 days in the future to check reproducibility
- some packages have valid-before, valid-after X.509 certificates
- no network access in the builds
- issue: libraries compiling optimisations depending on cpu. libgmp:
--enable-fat
option can disable this. Deciding which optimization to use on user side instead of build side.
On FreeBSD:
- poudriere
- tool to setup environment using jails
- very easy for users
- can set hostname, timezone, user inside the jail, build path, network, fake kernel version
- only thing not possible to change from jail: faking the time
- lot of people use that to build because it’s easy
For single projects:
- can be easier to use VM
- VM: need a lot of trust in the VM, we can’t yet reproducibly build images
- can use containers on Linux
Qubes:
- does not capture yet the environment
- build with same user and path
Google:
- build farm on Google internal network
- isolation not to enforce things, but avoid accidents
- bazel, sandbox
- with bazel, caching when building on the same machine
- not yet at reproducible cross-host and cross-user
OS X:
- no jail, standard chroot
- problem: some standard apple tools broken in chroot
- no easy way to change hostname and username
- best approach at the moment, using VM
- macports:
LD_PRELOAD
to hide some files: 100% overhead - Tor Browser: cross compiling from Linux using Apple SDK
- Homebrew: user relocatable installation, don’t produce binary identical binaries because of paths. sometimes grepping won’t work
Faking CPU can be tricky. One solution is to tell kvm to stop exposing the host CPU.
disorderfs, implemented with FUSE, allow to test for issues tied to filesystem ordering. By default it will return results of readdir(3) backward. It could be modified to always returned readdir(3) in a sorted order to implement a normalized environment.
Archive creation tools:
- GNU Tar has an option to do reproducible tarballs
- bsdtar not yet
- At Google, fixed: tar, zip, ar, rpm. Not yet upstreamed changes.
- lzma is threaded: result depends on number of cpu
Things we could do:
- Fixing libarchive, and archive tools: sort files list, normalize permissions
- Improve freebsd jails: number of cpu, amount of memory
- Missing: tool to run the same thing twice in different environments
- Improve Linux containers: fake cpu