Reproducible Builds,
the first ten years



Holger Levsen
CCCamp 2023

Who am I

  1. Holger Levsen / holger@debian.org, located in Hamburg, Germany. Born at 329 ppm. He/him 🏳️‍🌈🏳️‍⚧️.
  2. Debian user since 1995, contributing since 2001, Debian member since 2007. I ❤️ Debian.
  3. Working on Reproducible Builds since 2014. Aiming to make all ❤️ Free Software reproducible.
  4. Ask me anything, anytime. This is a pretty complex topic.

List of people working on this so far

akira • Alexander Bedrossian • Alexander Borkowski • Alexander Couzens (lynxis) • Alexis Bienvenüe • Alex Wilson • Allan Gunn (gunner) • Amit Biswas • Anders Kaseorg • Andrew Ayer • anonmos1 • Anoop Nadig • Arnout Engelen • Asheesh Laroia • Atharva Lele • Ben Hutchings • Benjamin Hof • Bernhard M. Wiedemann • Boyuan Yang • Brett Smith • Calum McConnell • Carl Dong • Ceridwen • Chris Lamb • Chris Smith • Christoph Berg • Christopher Baines • Chris West • Cindy Kim • Clemens Lang • Clint Adams • Dafydd Harries • Daniel Edgecumbe • Daniel Kahn Gillmor • Daniel Shahaf • Daniel Stender • David A. Wheeler • David Bremner • David del Amo • David Prévot • David Suarez • Dhiru Kholia • Dhole • Drakonis • Drew Fisher • Ed Maste • Edward Betts • Eitan Adler • Elio Qoshi • Eli Schwartz • Emanuel Bronshtein • Emmanuel Bourg • Esa Peuha • Fabian Keil • Fabian Wolff • Felix C. Stegerman • Feng Chai • Frédéric Pierret (fepitre) • Georg Faerber • Georg Koppen • Gonzalo Bulnes Guilpain • Graham Christensen • Guillem Jover • Hannes Mehnert • Hans-Christoph Steiner • Harlan Lieberman-Berg • heinrich5991 • Helmut Grohne • Hervé Boutemy • Holger Levsen (h01ger) • HW42 • Ian Muchina • intrigeri • jajajasalu2 • Jakub Wilk • James Fenn • Jan Nieuwenhuizen • Javier Jardón • Jelle van der Waa • Jelmer Vernooij • Jérémy Bobbio (lunar) • Johannes Schauer Marin Rodrigues • John Scott • Joshua Lock • Joshua Watt • Juan Picca • Juri Dispan • Justin Cappos • kpcyrd • Kushal Das • Levente Polyak • Liyun Li • Ludovic Courtès • Ludovic Courtès • Lukas Puehringer • Maliat Manzur • marco • MarcoFalke • Marcus Hoffmann (bubu) • Marek Marczykowski-Górecki • Maria Glukhova • Mariana Moreira • marinamoore • Mathieu Bridon • Mathieu Parent • Mattia Rizzolo • Michael Pöhn • Mike Perry • Morten Linderud • Muz • Mykola Nikishov • Nick Gregory • Nicolas Boulenguez • Nicolas Vigier • Niels Thykier • Niko Tyni • Omar Navarro Leija • opi • Oskar Wirga • Paul Gevers • Paul Spooren • Paul Wise • Peter Conrad • Peter De Wachter • Peter Wu • Philip Rinn • Profpatsch • Reiner Herrmann • Richard Purdie • Robbie Harwood • Roland Clobus • Santiago Torres • Santiago Vila • Sascha Steinbiss • Satyam Zode • Seth Schoen • Scarlett Clark • Simon Josefsson • Simon Schricker • Snahil Singh • Stefano Rivera • Stefano Zacchiroli • Stéphane Glondu • Steven Adger • Steven Chamberlain • Sylvain Beucler • Thomas Vincent • Tianon Gravi • Tobias Stoeckmann • Tom Fitzhenry • Ulrike Uhlig • Vagrant Cascadian • Valentin Lorentz • Valerie R Young • Vipul • Wookey • Ximin Luo

Contributors according to website.git/_data/contributors.yml

akira • Alexander Bedrossian • Alexander Borkowski • Alexander Couzens (lynxis) • Alexis Bienvenüe • Alex Wilson • Allan Gunn (gunner) • Amit Biswas • Anders Kaseorg • Andrew Ayer • anonmos1 • Anoop Nadig • Arnout Engelen • Asheesh Laroia • Atharva Lele • Ben Hutchings • Benjamin Hof • Bernhard M. Wiedemann • Boyuan Yang • Brett Smith • Calum McConnell • Carl Dong • Ceridwen • Chris Lamb • Chris Smith • Christoph Berg • Christopher Baines • Chris West • Cindy Kim • Clemens Lang • Clint Adams • Dafydd Harries • Daniel Edgecumbe • Daniel Kahn Gillmor • Daniel Shahaf • Daniel Stender • David A. Wheeler • David Bremner • David del Amo • David Prévot • David Suarez • Dhiru Kholia • Dhole • Drakonis • Drew Fisher • Ed Maste • Edward Betts • Eitan Adler • Elio Qoshi • Eli Schwartz • Emanuel Bronshtein • Emmanuel Bourg • Esa Peuha • Fabian Keil • Fabian Wolff • Felix C. Stegerman • Feng Chai • Frédéric Pierret (fepitre) • Georg Faerber • Georg Koppen • Gonzalo Bulnes Guilpain • Graham Christensen • Guillem Jover • Hannes Mehnert • Hans-Christoph Steiner • Harlan Lieberman-Berg • heinrich5991 • Helmut Grohne • Hervé Boutemy • Holger Levsen (h01ger) • HW42 • Ian Muchina • intrigeri • jajajasalu2 • Jakub Wilk • James Fenn • Jan Nieuwenhuizen • Javier Jardón • Jelle van der Waa • Jelmer Vernooij • Jérémy Bobbio (lunar) • Johannes Schauer Marin Rodrigues • John Scott • Joshua Lock • Joshua Watt • Juan Picca • Juri Dispan • Justin Cappos • kpcyrd • Kushal Das • Levente Polyak • Liyun Li • Ludovic Courtès • Ludovic Courtès • Lukas Puehringer • Maliat Manzur • marco • MarcoFalke • Marcus Hoffmann (bubu) • Marek Marczykowski-Górecki • Maria Glukhova • Mariana Moreira • marinamoore • Mathieu Bridon • Mathieu Parent • Mattia Rizzolo • Michael Pöhn • Mike Perry • Morten Linderud • Muz • Mykola Nikishov • Nick Gregory • Nicolas Boulenguez • Nicolas Vigier • Niels Thykier • Niko Tyni • Omar Navarro Leija • opi • Oskar Wirga • Paul Gevers • Paul Spooren • Paul Wise • Peter Conrad • Peter De Wachter • Peter Wu • Philip Rinn • Profpatsch • Reiner Herrmann • Richard Purdie • Robbie Harwood • Roland Clobus • Santiago Torres • Santiago Vila • Sascha Steinbiss • Satyam Zode • Seth Schoen • Scarlett Clark • Simon Josefsson • Simon Schricker • Snahil Singh • Stefano Rivera • Stefano Zacchiroli • Stéphane Glondu • Steven Adger • Steven Chamberlain • Sylvain Beucler • Thomas Vincent • Tianon Gravi • Tobias Stoeckmann • Tom Fitzhenry • Ulrike Uhlig • Vagrant Cascadian • Valentin Lorentz • Valerie R Young • Vipul • Wookey • Ximin Luo

About you

  • Who knows about Reproducible Builds, why and how?
  • Who contribute(s|d) to Reproducible Builds?
  • Who knows that Reproducible Builds have been known for more than 10 years? >30 years?
  • Who knows about SBOM? (Software Bill of Materials) = our .buildinfo files from 2014!

We need you!
Please support these efforts

  • Do you think reproducible builds should happen?
    If so, please help. We need your help and support.
  • The goals of this talk it to recap what we have done and to celebrate 10 years of awesomeness of many with the aim to get you informed, excited & involved.
    And to explain that a lot of work and support is still needed, despite all the progress and successes so far!
    We are still far from being done.
  • We can do it! 💪

Introduction

The problem

  • Source code of free software available
  • …most people install pre-compiled binaries
  • No one really knows how they really correspond (even those building those binaries).
  • As a result there are various classes of supply chain attacks.

Ancient history (>10 years ago)

  • Thread on debian-devel@lists.debian.org from 2007. Deemed undoable by many.
  • Though the idea initially appeared in 2000 on debian-devel@l.d.o.
  • And then in 2017 we learned from John Gilmore on rb-general@lists.reproducible-builds.org that GCC was reproducible in the early 1990s on several architectures!

Fast forward to 2023

    https://lists.zx2c4.com/pipermail/wireguard/2023-April/008045.html
    Wireguard (VPN app for Android) builds are now reproducible, their release is identical on their website, Google Play Store and F-Droid. 🎯🎯🎯🥳
    (it's more complicated than that, see their mail.)

    We were not even informed. 🥲 Poeople just do reproducible builds as normal part of their work nowadays. 🤗

People just do reproducible builds as normal part of their work nowadays.

🤗

Our mission

  • Enable anyone to independently verify that a given source produces bit by bit identical results.
  • Reproducible Builds are an important building block in making supply chains more secure. Nothing more, nothing less.
  • (Un)secure software build reproducibly still remains (un)secure software. However, with reproducible builds you can be sure that you are running the software you want to be running, built from the sources you want to be using.
  • By 2023 Reproducible Builds has been widely understood:
    https://reproducible-builds.org/resources/
    https://reproducible-builds.org/docs/
    https://reproducible-builds.org/docs/publications/
  • https://www.whitehouse.gov/briefing-room/statements-releases/2021/06/08/...
    • requires "Software Bill of Material" (SBOM)s for govermental software
    • so far only recommends reproducible builds / verified SBOMs

https://reproducible-builds.org/docs/definition/

  • When is a build reproducible?
  • A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.
  • The relevant attributes of the build environment, the build instructions and the source code as well as the expected reproducible artifacts are defined by the authors or distributors. The artifacts of a build are the parts of the build results that are the desired primary output.

How did we get there?

  • Money
  • Edward Snowden
  • Why money?

  • Bitcoin
  • Bitcoin (the software) was made reproducible in 2011.
  • Why Snowden

  • Well...
  • Torbrowser was made reproducible in 2013 by Mike Perry.
  • That's Firefox. One of the biggest software projects in the world.
  • How did we really get there?

  • Money / Bitcoin
  • Edward Snowden / Torbrowser
  • ...and a LOT of work by MANY people over 10 years
  • 2013 and 2014

    • Lunar's BoF at DebConf13.
    • another BoF at DebConf14
    • patches for dpkg: sorting fixes and .buildinfo files (SBOM!)
    • in September 2014 I started systematic builds of Debian packages, twice. First just 100 packages, than all of them.
    • Mike Perry and Seth Schoen gave a presentation at CCCongress in December 2014 showing "my" graphs. Wow.

    Debian unstable, 20150131

    2015

  • FOSDEM talk by Lunar and myself, inviting the Free Software world at large to collaborate and tackle this problem.
  • CCCamp presentation by Lunar, showing many problems and their solutions.
  • 1st Reproducible Builds Summit in Athens.
  • SOURCE_DATE_EPOCH spec
  • diffoscope
  • Common reasons for unreproducibilities:

  • timestamps, timestamps, timestamps
  • timestamps, timestamps, timestamps
  • build paths, build paths
  • all the rest
  • Ressources about unreproducibilities:

    • 422 known issue types in reproducible-notes.git
    • https://reproducible-builds.org/docs/
    • Lunar's talk at CCCamp 2015
    • https://github.com/bmwiedemann/theunreproduciblepackage
    • It's much easier to show common pitfalls making a package unreproducible than the opposite...

    3000 reprodubility related bugs fixed (mostly upstreamed), 500 patches pending...

    20000 bugs in 10 years ~= 5 per day

    Detour: some unexpected benefits of reproducible builds

    • Lower development costs and increased development speed through less developer time wasted on waiting for builds.
    • Software development: does this change really have no effect / the desired effect only?
    • Licence compliance: you can only be sure a binary is Free Software if it can be (re-)built reproducibly from a given source.
    • Reproducible verified SBOMs.

    diffoscope

    • Who knows about diffoscope?
    • Who uses diffoscope?
    • diffoscope tries to get to the bottom of what makes files or directories different. It will recursively unpack archives of many kinds and transform various binary formats into more human-readable form to compare them.

    diffoscope

  • Text and HTML ouput
  • File formats supported include: Android APK files, Android boot images, Android package resource table (ARSC), Apple Xcode mobile provisioning files, ar(1) archives, ASM Function, Berkeley DB database files, bzip2 archives, character/block devices, ColorSync colour profiles (.icc), Coreboot CBFS filesystem images, cpio archives, Dalvik .dex files, Debian .buildinfo files, Debian .changes files, Debian source packages (.dsc), Device Tree Compiler blob files, directories, ELF binaries, ext2/ext3/ext4/btrfs/fat filesystems, Flattened Image Tree blob files, FreeDesktop Fontconfig cache files, FreePascal files (.ppu), Gettext message catalogues, GHC Haskell .hi files, GIF image files, Git repositories, GNU R database files (.rdb), GNU R Rscript files (.rds), Gnumeric spreadsheets, GPG keybox databases, Gzipped files, Hierarchical Data Format database, HTML files (.html), ISO 9660 CD images, Java class files, Java .jmod modules, JavaScript files,
  • diffoscope

  • JPEG images, JSON files, Linux kernel images, LLVM IR bitcode files, local (UNIX domain) sockets and named pipes (FIFOs), LZ4 compressed files, lzip compressed files, macOS binaries, Microsoft Windows icon files, Microsoft Word .docx files, Mono ‘Portable Executable’ files, Mozilla-optimized .ZIP archives, Multimedia metadata, OCaml interface files, Ogg Vorbis audio files, OpenOffice .odt files, OpenSSH public keys, OpenWRT package archives (.ipk), PDF documents, PE32 files, PGP signatures, PGP signed/encrypted messages, PNG images, PostScript documents, Public Key Cryptography Standards (PKCS) files (version #7), Python pyc files, RPM archives, Rust object files (.deflate), Sphinx inventory files, SQLite databases, SquashFS filesystems, symlinks, tape archives (.tar), tcpdump capture files (.pcap), text files, TrueType font files, U-Boot legacy image files, WebAssembly binary module, XML binary schemas (.xsb), XML files, XMLB files, XZ compressed files, ZIP archives and Zstandard compressed files.
  • Fallback on hexdump comparison, fuzzy-matching to handle renamings, and much more!
  • diffoscope example output

  • Example diffoscope output for https-everywhere 5.0.6 vs 5.0.7
  • https://try.diffoscope.org
  • https://diffoscope.org
  • SOURCE_DATE_EPOCH

    • Who knows about SOURCE_DATE_EPOCH?
    • Build time stamps are meaningless. SOURCE_DATE_EPOCH describes the time of the last modification of the source (in seconds since the Unix epoch).
    • Supported by a lot of software today.
    • The specification is from 2015 and was updated in 2017.
    • https://reproducible-builds.org/docs/source-date-epoch/

    https://reproducible-builds.org

    Reproducible Builds Summits

  • 2015 Athens
  • 2016 Berlin
  • 2017 Berlin
  • 2018 Paris
  • 2019 Marrakech
  • 2022 Venice
  • 2023 Hamburg
  • Projects at Reproducible Builds Summits

    Alpine Linux, Apache Maven, Arch Linux, baserock, Bazel, bootstrappable.org, coreboot, CoyIM, Debian, Eclipse Adoptium, EdgeBSD, F-Droid, Fedora, FreeBSD, GNU Guix, GNU Mes, Google, Guardian Project, Guix, Homebrew, Huawei, Indiana University (IU), in-toto, IPFS, LEAP, LEDE, MacPorts, Max Planck Institute for Security and Privacy (MPI-SP), Microsoft, MirageOS, muinín, NetBSD, New York University (NYU), NixOS, Octez / Tezos, openSUSE, OpenWrt, pantsbuild.org, pkgsrc, Qubes OS, Quinel Ltd, repeatr.io, riot-os.org, Software Freedom Conservancy, subuser.org, Tails, Tor Project, Ubuntu, University of Pennsylvania (UPenn) and Warpforge.

    (There were more but we were asked to only mention these.)

    Reproducible-builds.org funding

    • r-b.o is a Software Freedom Conservancy (SFC) project since 2018, currently funding Chris Lambs, Mattia Rizzolo, Vagrant Cascadian and myself.
    • Funding needed for the summit in November in Hamburg.
    • Funding needed to support our continous work: community work, fixing upstreams, developing software, designing processes & POCs...
    • Thank you! ❤️

    Short overviews of various projects

    results for Debian unstable, until 20230804

    Debian trixie, 20230804

    CI reproducibility of Debian amd64

    Debian suitereproducibleunreproduciblefails to buildother
    stretch 23040(93.2%) 1514(6.1%) 85(0.3%) 80 (0.4%)
    buster 26653(93.9%) 1405(4.9%) 232(0.8%) 108 (0.4%)
    bullseye 29603(95.9%) 1405(2.7%) 232(1.0%) 108 (0.4%)
    bookworm 32692(95.3%) 1146(3.3%) 379(1.1%) 83 (0.3%)

    CI reproducibility of Debian amd64

    Debian suitereproducibleunreproduciblefails to buildother
    stretch 23040(93.2%) 1514(6.1%) 85(0.3%) 80 (0.4%)
    buster 26653(93.9%) 1405(4.9%) 232(0.8%) 108 (0.4%)
    bullseye 29603(95.9%) 1405(2.7%) 232(1.0%) 108 (0.4%)
    bookworm 32692(95.3%) 1146(3.3%) 379(1.1%) 83 (0.3%)

    https://beta.tests.reproducible-builds.org/debian

    Debian policy

    • 2017: packages should build reproducibly.
    • 2023? reproducible packages must not regress.
    • 2025? NEW packages must build reproducibly (to be allowed into testing and therefore into stable).
    • 2027? packages must build reproducibly (to be allowed into testing and stable.

    Debian policy

    • 2017: packages should build reproducibly.
    • 2023? reproducible packages must not regress. NEW packages must build reproducibly (to be allowed into testing and therefore into stable).
    • 2025? packages must build reproducibly (to be allowed into testing and stable.

    100%!

    • 100% reproducible is a political decision and nothing technical.
    • Thus we need to change debian-policy!
    • Thus Debian needs to change debian-policy!

    100% reproducibility in theory is not enough, by far.

    • Then we need rebuilders.
    • Thus we need a working snapshot.debian.org service.
    • And then we need reproducible transparency logs and logic what to do when....
    • And then we also need binary transparency logs (also because we haven't reached 100% yet).
    • The above is true for all projects, not just Debian.

    Short overview of reproducibility of various projects (AIUI)

    • Tails: "easy", pragmatically solved.
    • Arch Linux: has rebuilders and snapshot binary archive, though lacks further infrastructure and user tools like pacman-bintrans thus are merely PoCs.
    • Arch Linux is 86.4% reproducible with 1701 bad and 10849 good packages.
      [core] repository is 93.3% reproducible with 17 bad and 238 good packages.
      [extra] repository is 94.1% reproducible with 171 bad and 2860 good packages.
      [community] repository is 83.8% reproducible with 1481 bad and 7674 good packages.
      
    • SuSE: active development, by one person, not enabled in official builds

    Short overview of reproducibility of various projects, continued

    • nixOS: https://reproducible.nixos.org: 1570 out of 1572 (99.87%) paths in the minimal installation image are reproducible.
    • GNU Guix: also reproducible by design (like nixOS) - guix-challenge
    • Yocto: support for reproducible images.
    • F-Droid: supports reproducible builds though no UI (manual web crawling needed) nor promises.

    Short overview of reproducibility of various projects, continued

    • Alpine: basic support.
    • FreeBSD/NetBSD/OpenBSD: basic support.
    • Fedora/Redhat/Ubuntu: not interested it seems.
      • though Fedora 38 (April 2023) enabled clamping mtimes of package files using SOURCE_DATE_EPOCH from changelog when building packages.

    Summary of various projects

      Today many projects support reproducible builds, but it's unclear what that means, how it's enforced and how users can know and be confident.

      I call it reproducible in theory or in CI.

      This is a massive success! This was thought impossible not long ago!

    Theory vs Praxis

    • In theory, we are done. In practice, we have shown that reproducible builds can be done in theory.
    • Then we also need many rebuilders (!= CI builders) and we need to store the results somewhere and we need to define criterias how tools should treat that data, and then we need those tools...
    • And those missing 5% are also crucial however, or at least 1% of them. For Debian, 1% means 300 softwares...

    Summary

    • Many projects support reproducible builds in theory today, but it's unclear what that means in practice and how users can know and be confident.
    • This is a huge success.
    • Next: finish those last 1-5% upstream.
    • Next: create infrastructure of rebuilders in practice.
    • Next: create infrastructure, processes and tools to securely use those results...
    • Next: project-level consensus and commitment to reproducible builds in practice.

    Thank you
    … and all the contributors out there!

    Any questions? 🤷

    Holger Levsen <holger@reproducible-builds.org>
    B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C

    1337 most popular unreproducible source packages

    bind9 bluez ffmpeg gegl gnupg2 graphviz grub2 guile-2.2 ibus icu imagemagick libayatana-appindicator libdmapsharing libjcat libu2f-host libzstd lirc lynx mako nss numpy openh264 p7zip qtbase-opensource-src qtmultimedia-opensource-src qtquickcontrols2-opensource-src qtsensors-opensource-src qtspeech-opensource-src qtsvg-opensource-src qttools-opensource-src qtwebchannel-opensource-src qtx11extras-opensource-src underscore vlc xorg-docs

    build-essential-depends unreproducible source packages

    auctex black bluez codenarc cxxtest dask dejagnu doxygen eccodes eckit efl emacs emoslib ffmpeg fish freetds gdb gdcm gmetrics gnupg2 graphviz groovy gtk-sharp2 guile-3.0 h2database hevea ibus icu imagemagick infinipath-psm ipyparallel ldc libadwaita-1 libapache-poi-java libcamera libzstd linux86 lirc lombok lucene4.10 lucene8 lynx mako mono mpich mrmpi nbconvert nbsphinx node-mocha nss numpy nunit odc openh264 openjfx oxygen-icons5 pandas parallel pmix pstoedit pupnp python-graphviz python-jsonschema python-xarray qemu qt6-5compat qt6-declarative qtbase-opensource-src qtconnectivity-opensource-src qtmultimedia-opensource-src qtscript-opensource-src qtsensors-opensource-src qtserialport-opensource-src qtspeech-opensource-src qtsvg-opensource-src qttools-opensource-src qtwebchannel-opensource-src qtwebsockets-opensource-src qtx11extras-opensource-src r-base ruby-pygments.rb scikit-learn scipy scons secilc shaderc sphinx-gallery statsmodels systemtap twisted underscore valgrind vlc xmlstarlet xorg-docs