Controlling the build environment

45-minute + 10 minute session on day 1

Testing that a piece of software is reproducible means having the ability to control the environment to perform variations that could later be found in the wild. A project could also decide to making a piece of software reproducible by specifying precisely in which environment it should be build. Both approaches basically require the same set of tools, it’s just when that differs.

Non-exhaustive list of variations: username, hostname, build path, timezone, cpu, locale, filesystem ordering

Every project can approach differently:

  • some projects want to set the same build environment for every developers
  • other projects want different environments to produce same builds. In that case, need to test using different environments.

For Debian:

  • sbuild/pbuilder: unpack a minimal chroot, install dependencies, and build
  • script called prebuild, doing build twice with different environments to find sources of non-determinism
  • srebuild script using the list of packages involved in the build initially recorded in .buildinfo files. We can use snapshot.debian.org to recreate environment.
  • on debian, different system has been setup 399 days in the future to check reproducibility
  • some packages have valid-before, valid-after X.509 certificates
  • no network access in the builds
  • issue: libraries compiling optimisations depending on cpu. libgmp: --enable-fat option can disable this. Deciding which optimization to use on user side instead of build side.

On FreeBSD:

  • poudriere
  • tool to setup environment using jails
  • very easy for users
  • can set hostname, timezone, user inside the jail, build path, network, fake kernel version
  • only thing not possible to change from jail: faking the time
  • lot of people use that to build because it’s easy

For single projects:

  • can be easier to use VM
  • VM: need a lot of trust in the VM, we can’t yet reproducibly build images
  • can use containers on Linux

Qubes:

  • does not capture yet the environment
  • build with same user and path

Google:

  • build farm on Google internal network
  • isolation not to enforce things, but avoid accidents
  • bazel, sandbox
  • with bazel, caching when building on the same machine
  • not yet at reproducible cross-host and cross-user

OS X:

  • no jail, standard chroot
  • problem: some standard apple tools broken in chroot
  • no easy way to change hostname and username
  • best approach at the moment, using VM
  • macports: LD_PRELOAD to hide some files: 100% overhead
  • Tor Browser: cross compiling from Linux using Apple SDK
  • Homebrew: user relocatable installation, don’t produce binary identical binaries because of paths. sometimes grepping won’t work

Faking CPU can be tricky. One solution is to tell kvm to stop exposing the host CPU.

disorderfs, implemented with FUSE, allow to test for issues tied to filesystem ordering. By default it will return results of readdir(3) backward. It could be modified to always returned readdir(3) in a sorted order to implement a normalized environment.

Archive creation tools:

  • GNU Tar has an option to do reproducible tarballs
  • bsdtar not yet
  • At Google, fixed: tar, zip, ar, rpm. Not yet upstreamed changes.
  • lzma is threaded: result depends on number of cpu

Things we could do:

  • Fixing libarchive, and archive tools: sort files list, normalize permissions
  • Improve freebsd jails: number of cpu, amount of memory
  • Missing: tool to run the same thing twice in different environments
  • Improve Linux containers: fake cpu