Reproducible Builds,
the first ten years



Holger Levsen
Debian Reunion 2023
Hamburg Altona, Germany

Disclaimer for this talk from last month: This talk is a work in progress,
trying to continue to document the history of Reproducible Builds.
This talk is an early beta release. I hope you'll enjoy it!

And then... this morning I wanted to cancel this talk, as I didn't manage to polish the talk as I wanted and as appropriate.

As you can see I decided against canceling it. I just hope I won't regret this.

I hope YOU won't regret this.

Why this talk hardly progressed from the last beta.

"No USB devices were harmed during the preparation of this talk."
More seriously, my laptop broke the week before last week when I went to Denmark, where I wanted to prepare this event and talk. And all because I did what's recommended: updating firmware....
And then I had to make this event happen. So this talk will be way more chaotic than I would like. I'm sorry and very unhappy about this. :(
I'm sure I'll improve, once again, eventually.
Anyhow...

Maybe the talk title should have been:
my first 10 years with reproducible builds
- though this is not about my work:

Reproducible builds, like Free Software in general, is a collective effort.

And the idea is also much older than 10 years...

very incomplete list of people
who have been working on this so far

akira • Alexander Bedrossian • Alexander Borkowski • Alexander Couzens (lynxis) • Alexis Bienvenüe • Alex Wilson • Allan Gunn (gunner) • Amit Biswas • Anders Kaseorg • Andrew Ayer • anonmos1 • Anoop Nadig • Arnout Engelen • Asheesh Laroia • Atharva Lele • Ben Hutchings • Benjamin Hof • Bernhard M. Wiedemann • Boyuan Yang • Brett Smith • Calum McConnell • Carl Dong • Ceridwen • Chris Lamb • Chris Smith • Christoph Berg • Christopher Baines • Chris West • Cindy Kim • Clemens Lang • Clint Adams • Dafydd Harries • Daniel Edgecumbe • Daniel Kahn Gillmor • Daniel Shahaf • Daniel Stender • David A. Wheeler • David Bremner • David del Amo • David Prévot • David Suarez • Dhiru Kholia • Dhole • Drakonis • Drew Fisher • Ed Maste • Edward Betts • Eitan Adler • Elio Qoshi • Eli Schwartz • Emanuel Bronshtein • Emmanuel Bourg • Esa Peuha • Fabian Keil • Fabian Wolff • Felix C. Stegerman • Feng Chai • Frédéric Pierret (fepitre) • Georg Faerber • Georg Koppen • Gonzalo Bulnes Guilpain • Graham Christensen • Guillem Jover • Hannes Mehnert • Hans-Christoph Steiner • Harlan Lieberman-Berg • heinrich5991 • Helmut Grohne • Hervé Boutemy • Holger Levsen (h01ger) • HW42 • Ian Muchina • intrigeri • jajajasalu2 • Jakub Wilk • James Fenn • Jan Nieuwenhuizen • Javier Jardón • Jelle van der Waa • Jelmer Vernooij • Jérémy Bobbio (lunar) • Johannes Schauer Marin Rodrigues • John Scott • Joshua Lock • Joshua Watt • Juan Picca • Juri Dispan • Justin Cappos • kpcyrd • Kushal Das • Levente Polyak • Liyun Li • Ludovic Courtès • Ludovic Courtès • Lukas Puehringer • Maliat Manzur • marco • MarcoFalke • Marcus Hoffmann (bubu) • Marek Marczykowski-Górecki • Maria Glukhova • Mariana Moreira • marinamoore • Mathieu Bridon • Mathieu Parent • Mattia Rizzolo • Michael Pöhn • Mike Perry • Morten Linderud • Muz • Mykola Nikishov • Nick Gregory • Nicolas Boulenguez • Nicolas Vigier • Niels Thykier • Niko Tyni • Omar Navarro Leija • opi • Oskar Wirga • Paul Gevers • Paul Spooren • Paul Wise • Peter Conrad • Peter De Wachter • Peter Wu • Philip Rinn • Profpatsch • Reiner Herrmann • Richard Purdie • Robbie Harwood • Roland Clobus • Santiago Torres • Santiago Vila • Sascha Steinbiss • Satyam Zode • Seth Schoen • Scarlett Clark • Simon Josefsson • Simon Schricker • Snahil Singh • Stefano Rivera • Stefano Zacchiroli • Stéphane Glondu • Steven Adger • Steven Chamberlain • Sylvain Beucler • Thomas Vincent • Tianon Gravi • Tobias Stoeckmann • Tom Fitzhenry • Ulrike Uhlig • Vagrant Cascadian • Valentin Lorentz • Valerie R Young • Vipul • Wookey • Ximin Luo

(Huge sorry if YOU are missing, please lets fix this. The real list is twice as big at least..!)

Who am I

  1. Holger Levsen / holger@debian.org, located in Hamburg, Germany
  2. Debian user since 1995, contributing since 2001, Debian member since 2007. I ❤️ Debian.
  3. Working on Reproducible Builds since 2014, trying to make all ❤️ Free Software reproducible.
  4. Ask me anything, anytime. This is a pretty complex topic.

About you

  • Who knows about Reproducible Builds, why and how?
  • Who contribute(s|d) to Reproducible Builds?
  • Who knows that Reproducible Builds have been known for more than 10 years? >30 years?
  • Who knows about SBOM? (Software Bill of Materials) = our .buildinfo files from 2014!

Introduction

The problem

  • Source code of free software available
  • …most people install pre-compiled binaries
  • No one knows whether they really correspond.
  • As a result there are various classes of supply chain attacks.

The solution

  • Enable anyone to independently verify that a given source produces bit by bit identical results.
  • Reproducible Builds are an important building block in making supply chains more secure. Nothing more, nothing less.
  • As a side effect: you can only be sure a binary is free software if it has been reproduced. Someone elses binary is only certainly free software if it's reproducible!

The definition

  • When is a build reproducible?
  • A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.
  • The relevant attributes of the build environment, the build instructions and the source code as well as the expected reproducible artifacts are defined by the authors or distributors. The artifacts of a build are the parts of the build results that are the desired primary output.
  • https://reproducible-builds.org/docs/definition/

By now this has been widely and largly understood:
https://reproducible-builds.org/resources/
https://reproducible-builds.org/docs/
https://reproducible-builds.org/docs/publications/

https://www.whitehouse.gov/briefing-room/statements-releases/2021/06/08/...

show presentation from Mike Perry and Seth Schoen from 2013.

https://reproducible-builds.org

Fast forward to 2023

https://lists.zx2c4.com/pipermail/wireguard/2023-April/008045.html
Wireguard (VPN app for Android) builds are now reproducible, their release is identical on their website, Google Play Store and F-Droid. 🎯🎯🎯🥳
(it's more complicated than that, see their mail.)

We were not even informed. 🥲 Poeople just do reproducible builds as normal part of their work nowadays. 🤗

People just do reproducible builds as normal part of their work nowadays.

🤗

How did we get there?

  • Money
  • Edward Snowden
  • Why money?

  • Bitcoin
  • Gitian
  • Bitcoin (the software) was reproducible in 2011.
  • Why Snowden

  • Well...
  • Mike Perry made Torbrowser reproducible in 2013.
  • That's Firefox. One of the biggest software projects in the world.
  • Lunar's BoF at DebConf13.
  • Even earlier works

  • Show that thread on debian-devel@lists.debian.org from 2007
  • Though the idea initially appeared in 2000 on debian-devel@l.d.o.
  • And then in 2017 we learned from John Gilmore on rb-general@lists.reproducible-builds.org that GCC was reproducible in the early 1990s on several architectures!
  • Detour: early computing

  • in 2015 I've heard rumors, that in the past slot machines had to be reproducible, due to VAT fraud fears.
  • fact: when machines had 4kb memory, some people knew every bit. That culture got lost when 640kb where not enough anymore...
  • when machines got closer to 640 gigabye of memory the idea that someone would know every bit had become unimagineable.
  • Detour: unexpected benefits of reproducible builds

  • in 2022 I learned about an Italian company doing certification for gambling machines using diffoscope...
  • Licence compliance: you can only be sure a binary is Free Software if it can be (re-)built reproducibly from a given source.
  • Software development: does this change really have no effect / the desired effect only?
  • Detour: diffoscope

  • Who knows about diffoscope?
  • Who uses diffoscope?
  • show https://diffoscope.org
  • mention https://try.diffoscope.org
  • Back to 2013 onward

  • Lunar's BoF at DebConf13.
  • another BoF at DebConf14
  • patches for dpkg: sorting fixes and .buildinfo files (SBOM!)
  • in September 2014 I started systematic builds of Debian packages, twice. First just 100 packages, than all of them.
  • Mike Perry and Seth Schoen gave that presentation at CCCongress in December 2014 showing "my" graphs. Wow.
  • Debian unstable, 20150131

    Debian unstable, 20230424

    2015

  • FOSDEM talk by Lunar and myself, inviting the Free Software world at large to collaborate and tackle this problem.
  • CCCamp presentation by Lunar, showing many problems and their solutions.
  • SOURCE_DATE_EPOCH specification: https://reproducible-builds.org/specs/source-date-epoch/
  • 1st Reproducible Builds Summit in Athens.
  • Reproducible Builds Summits

  • 2015 Athens
  • 2016 Berlin
  • 2017 Berlin
  • 2018 Paris
  • 2019 Marrakech
  • 2022 Venice
  • 2023 Hamburg
  • Projects at Reproducible Builds Summits

    Alpine Linux, Apache Maven, Arch Linux, baserock, Bazel, bootstrappable.org, coreboot, CoyIM, Debian, Eclipse Adoptium, EdgeBSD, F-Droid, Fedora, FreeBSD, GNU Guix, GNU Mes, Google, Guardian Project, Guix, Homebrew, Huawei, Indiana University (IU), in-toto, IPFS, LEAP, LEDE, MacPorts, Max Planck Institute for Security and Privacy (MPI-SP), Microsoft, MirageOS, muinín, NetBSD, New York University (NYU), NixOS, Octez / Tezos, openSUSE, OpenWrt, pantsbuild.org, pkgsrc, Qubes OS, Quinel Ltd, repeatr.io, riot-os.org, Software Freedom Conservancy, subuser.org, Tails, Tor Project, Ubuntu, University of Pennsylvania (UPenn) and Warpforge.

    (There were more but we were asked to only mention these.)

    Common reasons for unreproducibilities

  • timestamps, timestamps, timestamps
  • timestamps, timestamps, timestamps
  • build pathes, build pathes
  • all the rest
  • I'll just explain here how to address time stamps and build pathes embedded in build products.
  • SOURCE_DATE_EPOCH

  • who knows about SOURCE_DATE_EPOCH?
  • build time stamps are meaningless. SOURCE_DATE_EPOCH describes the time of the last modification of the source.
  • supported by a lot of software today.
  • show https://reproducible-builds.org/docs/source-date-epoch/
  • build path variation

  • The solution is simple. But it took me almost 10 years to get there.
  • First we tried to fix them. Still a valid and useful approach.
  • Then we quickly came up with a workaround: record the build path and do rebuilds in the same build path.
  • in April 2023 in a discussion with Vagrant a much simpler solution came up: just don't vary the build path, instead use predictable build pathes like /buildpath/linux-6.2.23
  • Debian unstable, 20230424

    Debian bookworm, 20230424

    more history needs to be written

  • https://reproducible-builds.org/docs/history/ ends in 2015.😟
  • Arch Linux has done a lot. Rebuilders and pacman-bintrans.
  • CI builds vs rebuilders.
  • Fedora finally enabled r-b macros for RPM.
  • SBOM should be mentioned. And that without reproducible builds SBOMs are rather meaningless, while with them, those are verified SBOMs!.
  • Help would be very much welcome to write our history. While it's fresh, and not 30 years later.

  • Thank you
    … and all the contributors out there!

    Do you think reproducible builds should happen?
    If so, please help.
    We need your help.

    I still haven't found what I'm looking for
    but I'm confident we'll get there, eventually!

    Holger Levsen <holger@debian.org>
    B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C


    The end?

    Or do you want to hear more?

    The following stats are mostly from September 2022...

    as the saying goes: "please excuse this long letter, I didn't have the time for a shorter one."

    Short overview of reproducibility of other projects (all AIUI)

      Tails: "easy", pragmatically "solved" but not systematically...
    • Arch Linux: has rebuilders, though also lacks user tools and/or other integration
    • Arch Linux is 86.4% reproducible with 1701 bad and 10849 good packages.
      [core] repository is 93.3% reproducible with 17 bad and 238 good packages.
      [extra] repository is 94.1% reproducible with 171 bad and 2860 good packages.
      [community] repository is 83.8% reproducible with 1481 bad and 7674 good packages.
      
    • SuSE: active development, by one person, not enabled in official builds

    Short overview of reproducibility of other projects (all AIUI), continued

  • nixOS: https://reproducible.nixos.org: 1570 out of 1572 (99.87%) paths in the minimal installation image are reproducible!
  • GNU Guix: also reproducible by design (like nixOS) - guix-challenge
  • Yocto: support for reproducible images
  • F-Droid: supports reproducible builds though no UI (manual web crawling needed) nor promises
    • "Corona Contract Tracing German": update problem due to unreproducibility
  • Short overview of reproducibility of other projects (all AIUI), continued

  • Alpine: basic support
  • FreeBSD/NetBSD/OpenBSD: basic support
  • Fedora/Redhat/Ubuntu: not interested it seems
  • though Fedora recently enabled r-b features via a makro
  • Summary of reproducibility of other projects (all AIUI)

    Many projects support reproducible builds by now, but it's unclear what that means, how it's enforced and how users can know and be confident.

    Also: 96% is hardly ever enough, bad for two reasons..

    🎶 I still haven't found what I'm looking for 🎶.

    Some more information ;-)

    I probably didn't backdoor this

  • https://github.com/kpcyrd/i-probably-didnt-backdoor-this
  • a fine manual...
  • simple hello world in Rust
  • Reproducing the ELF binary
  • Reproducing the Docker image
  • Reproducing the Arch Linux package
  • The unreproducible package

  • https://github.com/bmwiedemann/theunreproduciblepackage
  • It's much easier to show common pitfalls making a package unreproducible than the opposite...
  • https://reproducible-builds.org/docs

    Debian

    Reproducible Builds were first discussed at DebConf13...

    ..in a BoF hosted by Lunar sparking all of this. DebConf14 had another BoF.

    Automated test builds at the end of 2014.

    FOSDEM 2015: getting the wider FLOSS community involved.

    diffoscope!

    First summit at the end of 2015 in Athens.

    DebConf15 had four people giving the talk...

    “How can we get this done...???”

    We wondered at the beginning of the Stretch development cycle.

    Reproducible talks at least...?

    DebConf16

    DebConf17

    DebConf18

    DebConf19

    DebConf20

    DebConf21

    “I feel I have given warnings that the next Debian release will not be reproducible for years.” is a quote from last years.

    ...and I feel fine! 😀

    Schrödingers h01ger: frustrated and happy.

    Indeed I have given warnings that the next Debian release will not be reproducible for years...

    ...and I feel fine! 😀

    Let me explain. First the frustration...

    Debian 9 / stretch

    The "reproducible in theory but not in practice" release

    Debian 10 / buster

    The "we could be reproducible but we are not" release

    Debian 11 / bullseye

    The "we are almost there but still haven't sorted out some requirements" release

    Debian 9 / stretch

    The "reproducible in theory but not in practice" release

    Debian 10 / buster

    The "we could be reproducible but we are not" release

    Debian 11 / bullseye

    The "we are almost made it" release

    Debian 12 / bookworm

    The first Debian release with some meaningful reproducibility?

    The previous two slides were from last year...


    Debian 12 / bookworm

    The first Debian release with some meaningful/usable reproducibility?!?

    Debian 13 / trixie

    I still haven't found what I'm looking for

    Debian issues in depth

    96% reproducibility is a lie.

    or rather: 96% are CI results.

    I explain what's "wrong" with CI results in a moment...

    96% reproducibility is neither a lie nor useless...

    96% reproducibility is neither a lie nor useless...

    96% in detail

    • we are at 96.1% (29651 out of 30869 source packages) CI reproducibiliy for bullseye now.

    • that's almost 2% up compared to buster (93.9%)
    • or almost 3000 more reproducible packages (29651 instead of 26682 in buster)
    • or even more impressive: we've solved one third of the remaining 6% buster had...

    Did I say bullseye?

    • So what about bookworm?
    • We are at 96.2% (33018 out of 34313 source packages) CI reproducibiliy for bookworm.
    • YAY.

    CI versus rebuilds:

    • We have no Debian infrastructure rebuilding Debian packages. The reproducible-builds.org rebuilders are builders, not rebuilders.
    • That's why I called 96% (or whatever) a "lie".
    • Up until recently we had two main blockers for rebuilders:
      • >3000 packages without .buildinfo files, fixed by myself in February 2021 and in June 2022.
      • snapshot.debian.org was (and is) unusable for rebuilds, fixed by Frédéric Pierret and josch since June 2021, by providing a partial mirror for amd64 only and only going back until January 2017.

    CI versus rebuilds:

    • We have no Debian infrastructure rebuilding Debian packages. The reproducible-builds.org rebuilders are builders, not rebuilders.
    • https://beta.tests.reproducible-builds.org/debian is showing rebuilds of ftp.debian.org - huge thanks to Frédéric Pierret for this PoC.
    • Sadly, Frédéric's rebuilder is down atm...
    • And one rebuilder is not good enough also. It's a start though:

    https://beta.tests.reproducible-builds.org/debian

    https://beta.tests.reproducible-builds.org/debian

    https://beta.tests.reproducible-builds.org/debian

    https://beta.tests.reproducible-builds.org/debian

      unreproducible in build-essential:
    • linux
    • gcc

    https://beta.tests.reproducible-builds.org/debian

    • amd64 only, also because our snapshot mirror is amd64 only
    • one rebuilder only, not several (and at least some should run on Debian ressources)
    • one person maintaining this so far. Thank you very much, Frédéric Pierret, and sorry too.

    working around snapshot.debian.org

    • snapshot.debian.org was (and is) unusable for rebuilds, fixed by Frédéric Pierret and josch since June 2021, by providing a partial mirror for amd64 only and only going back until January 2017.
    • without "a working" snapshot.debian.org (it works, "just" not for our usecases) we cannot have reproducible Debian...
    • sadly snapshot.notset.fr is currently down and snapshot.reproducible-builds.org ist not yet up... :/

    improvements to our snapshot.debian.org mirror

    • soon to be hosted at OSUOSL as snapshot.reproducible-builds.org
    • we want at least arm64 too, though that needs more than just HW. See the MR above.

    "Solved" problems with .buildinfo files

    • we had >3000 packages without .buildinfo files, I NMUed all of them (with the help of David Bremner!) 😇 Just NEW ones will keep coming...
    • buildinfos.debian.net is just a proof of concept, but it works around #862073, #763822, #862538, #929397 well enough.
    • GPG keys expire, so we just ignore signatures...

    And then, meaningful reproducibilty of Debian is still not possible because:

    • linux, gcc and glibc are our current blockers getting build-essential reproducible in bookworm.
    • Debian installer images are not reproducible in bullseye.
    • Debian Live images are not reproducible in bullseye.
    • Sadly "bullseye" was not a typo in the last two lines. :(

    meaningful reproducibilty of Debian d-i images
    (amd64 only)

    • Debian installer images, are reproducible when build from git, as shown by Roland Clobus. The problem here is that automated testing of d-i images fails almost constantly in sid and testing...

    meaningful reproducibilty of Debian live images
    (amd64 only)

    • Debian Live images are reproducible using live-build as shown by Roland Clobus..
      • reproducible package installation != reproducible packages
      • future of Debian live images uncertain, though we have 3 choices now: none, unreproducible or reproducible.

    more on d-i and live images

    • Roland Clobus gave a talk at the Debian Reunion Hamburg about his efforts to revive live-images.
    • Roland and Phil Hands are working together to get those images tested for functionality as well, using https://openqa.debian.net.
    • There's a "Debian installer and images team BoF" happening now, though I don't know if live images will be covered there.

    other issues, release team area

    • We are very happy that testing migration is blocked for binary uploads.
    • We very much like the idea of accellerating migration for reproducibility.
    • Debian policy: too early for "must", but maybe for trixie we can have "must not regress"?

    other issues, salsa CI related

    • "btw", reprotest is basically unmaintained upstream.

    bookworm goals

    6 months until the freeze.
    • 0 packages without .buildinfo files..
    • build-essential reproducible.
    • d-i images reproducible.
    • live images reproducible.
    • more archs on our snapshot mirror (arm64?).
    • a 2nd rebuilder of ftp.debian.org. and a 3rd...

    trixie goals

    • snapshot.debian.org usable for mass rebuilds by many users for all architectures.
    • more rebuilders! (instead of more CI builders)
    • 0 bugs with patches unuploaded. Currently there are 292 of these. 2 NMUs per week, uploaded to DELAYED/15.
    • #863622: apt: warn when installing packages that are not reproducible
    • .buildinfo files known and used by dak.

    post trixie goals

    • I still haven't found what I'm looking for...!
    • 100% reproducible packages and distributed images for trixie+1?
    • What else?

    Thank you
    … and all the contributors out there!

    Do you think reproducible builds should happen?
    If so, please help.
    We need your help.

    I still haven't found what I'm looking for
    but I'm confident we'll get there, eventually!

    Holger Levsen <holger@debian.org>
    B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C