Rebuilding what is distributed from ftp.debian.org



Holger Levsen
MiniDebConf Toulouse / Capitole du Libre
2024-11-16, Toulouse, France
lunarⒶdebian.org / https://lunar.anargeek.net

Reproducible Builds,
rebuilding what is distributed from ftp.debian.org


Holger Levsen
MiniDebConf Toulouse / Capitole du Libre
2024-11-16, Toulouse, France

Who am I

  1. Holger Levsen / holger@debian.org, located in Hamburg, Germany. Born at 329 ppm. He/him. 🏳️‍🌈🏳️‍⚧️🖤😷
  2. Debian user since 1995, contributing since 2001, Debian member since 2007. I ❤️ Debian.
  3. Working on Reproducible Builds since 2014. Aiming to make all ❤️ Free Software reproducible.
  4. Ask me anything, anytime. This is a pretty complex topic.
  5. I'm here to present the work of many people:

according to https://reproducible-builds.org/who/people/

akira • Alexander Bedrossian • Alexander Borkowski • Alexander Couzens (lynxis) • Alexis Bienvenüe • Alex Wilson • Allan Gunn (gunner) • Amit Biswas • Anders Kaseorg • Andrew Ayer • anonmos1 • Anoop Nadig • Arnout Engelen • Asheesh Laroia • Atharva Lele • Ben Hutchings • Benjamin Hof • Bernhard M. Wiedemann • Boyuan Yang • Brett Smith • Calum McConnell • Carl Dong • Ceridwen • Chris Lamb • Chris Smith • Christoph Berg • Christopher Baines • Chris West • Cindy Kim • Clemens Lang • Clint Adams • Dafydd Harries • Daniel Edgecumbe • Daniel Kahn Gillmor • Daniel Shahaf • Daniel Stender • David A. Wheeler • David Bremner • David del Amo • David Prévot • David Suarez • Dhiru Kholia • Dhole • Drakonis • Drew Fisher • Ed Maste • Edward Betts • Eitan Adler • Elio Qoshi • Eli Schwartz • Emanuel Bronshtein • Emmanuel Bourg • Esa Peuha • Evangelos Ribeiro Tzaras • Fabian Keil • Fabian Wolff • Felix C. Stegerman • Feng Chai • Frédéric Pierret (fepitre) • Georg Faerber • Georg Koppen • Gonzalo Bulnes Guilpain • Graham Christensen • Greg Chabala • Guillem Jover • Hannes Mehnert • Hans-Christoph Steiner • Harlan Lieberman-Berg • heinrich5991 • Helmut Grohne • Hervé Boutemy • Holger Levsen (h01ger) • HW42 • Ian Muchina • intrigeri • jajajasalu2 • Jakub Wilk • James Fenn • Jan Nieuwenhuizen • Jan-Benedict Glaw • Javier Jardón • Jelle van der Waa • Jelmer Vernooij • Jérémy Bobbio (lunar) • Jochen Sprickerhof • Johannes Schauer Marin Rodrigues • John Neffenger • John Scott • Joshua Lock • Joshua Watt • Juan Picca • Juri Dispan • Justin Cappos • kpcyrd • Kushal Das • Levente Polyak • Linus Nordberg • Liyun Li • Ludovic Courtès • Lukas Puehringer • Maliat Manzur • marco • Marco Villegas • MarcoFalke • Marcus Hoffmann (bubu) • Marek Marczykowski-Górecki • Maria Glukhova • Mariana Moreira • marinamoore • Martin Suszczynski • Mathieu Bridon • Mathieu Parent • Mattia Rizzolo • Michael Pöhn • Mike Perry • Morten Linderud • Muz • Mykola Nikishov • Nick Gregory • Nicolas Boulenguez • Nicolas Vigier • Niels Thykier • Niko Tyni • Oejet • Omar Navarro Leija • opi • Orhun Parmaksiz • Oskar Wirga • Paul Gevers • Paul Spooren • Paul Wise • Peter Conrad • Peter De Wachter • Peter Wu • Philip Rinn • Pol Dellaiera • Profpatsch • Rahul Bajaj • Reiner Herrmann • Richard Purdie • Robbie Harwood • Roland Clobus • Russ Cox • Santiago Torres • Santiago Vila • Sascha Steinbiss • Satyam Zode • Scarlett Clark • Sebastian Crane • Seth Schoen • Simon Butler • Simon Josefsson • Simon Schricker • Snahil Singh • Stefano Rivera • Stefano Zacchiroli • Stéphane Glondu • Steven Adger • Steven Chamberlain • Sune Vuorela • Sylvain Beucler • Thomas Vincent • Tianon Gravi • Tim Jones • Tobias Stoeckmann • Tom Fitzhenry • Ulrike Uhlig • Vagrant Cascadian • Valentin Lorentz • Valerie R Young • Vipul • Wookey • Ximin Luo

About you

  • Who knows about Reproducible Builds, why and how?
  • Who contribute(s|d) to Reproducible Builds?
  • Who knows that Reproducible Builds have been known for more than 10 years? >30 years?
  • Who knows about SBOM? (Software Bill of Materials) ~= our .buildinfo files from 2014!

Introduction

The problem

  • Source code of free software available
  • …most people install pre-compiled binaries
  • No one really knows how they really correspond (even those building those binaries).
  • As a result there are various classes of supply chain attacks.

https://reproducible-builds.org/docs/definition/

  • When is a build reproducible?
  • A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.
  • The relevant attributes of the build environment, the build instructions and the source code as well as the expected reproducible artifacts are defined by the authors or distributors. The artifacts of a build are the parts of the build results that are the desired primary output.

Our mission

  • Enable anyone to independently verify that a given source produces bit by bit identical results.
  • Reproducible Builds are an important building block in making supply chains more secure. Nothing more, nothing less.
  • (Un)secure software build reproducibly still remains (un)secure software. However, with reproducible builds you can be sure that you are running the software you want to be running, built from the sources you want to be using.

Our mission

  • Enable anyone to independently verify that a given source produces bit by bit identical results.
  • Most people will probably say: what does that even mean?



Our new slogan in the making...

  • Enabling supply chain security.

By 2024 Reproducible Builds has been widely understood:


  • https://reproducible-builds.org/resources/
    https://reproducible-builds.org/docs/
    https://reproducible-builds.org/docs/publications/
  • https://www.whitehouse.gov/briefing-room/statements-releases/2021/06/08/...
    • requires "Software Bill of Material" (SBOM)s for govermental software
    • so far only recommends reproducible builds / verified SBOMs

How did we get there?

  • Money
  • Edward Snowden
  • Why money?

  • Bitcoin (the software) was made reproducible in 2011.
  • Why Snowden

  • Well...after Snowden:
  • Torbrowser was made reproducible in 2013 by Mike Perry.
  • That's Firefox. One of the biggest software projects in the world.
  • How did we really get there?

  • Money / Bitcoin
  • Edward Snowden / Torbrowser
  • ...and a LOT of work by MANY people over MANY years.
  • 2013 and 2014

    • Lunar hosted a brainstorming meeting at DebConf13.
    • and another one at DebConf14

    2013 and 2014

    • Lunar hosted a brainstorming meeting at DebConf13.
    • and another one at DebConf14
    • patches for dpkg: sorting fixes and .buildinfo files (SBOM!)
    • in September 2014 I started systematic builds of Debian packages, twice. First just 100 packages, then all of them.
    • Mike Perry and Seth Schoen gave a presentation at CCCongress in December 2014 showing "my" graphs. Wow.

    Debian unstable, 20150131

    2015

  • FOSDEM talk by Lunar and myself, inviting the free software world to collaborate and tackle this problem.
  • CCCamp presentation by Lunar, showing many problems and their solutions.
  • 1st Reproducible Builds Summit in Athens.
  • SOURCE_DATE_EPOCH spec
  • debbindiff by Lunar
  • 2015

  • FOSDEM talk by Lunar and myself, inviting the free software world to collaborate and tackle this problem.
  • CCCamp presentation by Lunar, showing many problems and their solutions.
  • 1st Reproducible Builds Summit in Athens.
  • SOURCE_DATE_EPOCH spec
  • diffoscope by Lunar and ~84 other contributors
  • Common reasons for unreproducibilities:

  • timestamps, timestamps, timestamps
  • timestamps, timestamps, timestamps
  • build paths, build paths
  • all the rest
  • Resources about unreproducibilities:

    • 430 known issue types in reproducible-notes.git
    • Lunar's talk at CCCamp 2015
    • https://reproducible-builds.org/docs/
    • It's much easier to show common pitfalls making a package unreproducible than the opposite:
      • https://github.com/bmwiedemann/theunreproduciblepackage

    SOURCE_DATE_EPOCH

    • Build time stamps are largly meaningless. SOURCE_DATE_EPOCH describes the time of the last modification of the source (in seconds since the Unix epoch).
    • Supported by a lot of software today.
    • The specification is from 2015 and was updated in 2017.
    • https://reproducible-builds.org/docs/source-date-epoch/

    diffoscope

    • Who uses or has used diffoscope?
    • diffoscope tries to get to the bottom of what makes files or directories different. It will recursively unpack archives of many kinds and transform various binary formats into more human-readable form to compare them.
    • https://try.diffoscope.org
    • https://diffoscope.org

    diffoscope

  • Text and HTML ouput
  • File formats supported include: Android APK files, Android boot images, Android package resource table (ARSC), Apple Xcode mobile provisioning files, ar(1) archives, ASM Function, Berkeley DB database files, bzip2 archives, character/block devices, ColorSync colour profiles (.icc), Coreboot CBFS filesystem images, cpio archives, Dalvik .dex files, Debian .buildinfo files, Debian .changes files, Debian source packages (.dsc), Device Tree Compiler blob files, directories, ELF binaries, ext2/ext3/ext4/btrfs/fat filesystems, Flattened Image Tree blob files, FreeDesktop Fontconfig cache files, FreePascal files (.ppu), Gettext message catalogues, GHC Haskell .hi files, GIF image files, Git repositories, GNU R database files (.rdb), GNU R Rscript files (.rds), Gnumeric spreadsheets, GPG keybox databases, Gzipped files, Hierarchical Data Format database, HTML files (.html), ISO 9660 CD images, Java class files, Java .jmod modules, JavaScript files,
  • diffoscope

  • JPEG images, JSON files, Linux kernel images, LLVM IR bitcode files, local (UNIX domain) sockets and named pipes (FIFOs), LZ4 compressed files, lzip compressed files, macOS binaries, Microsoft Windows icon files, Microsoft Word .docx files, Mono ‘Portable Executable’ files, Mozilla-optimized .ZIP archives, Multimedia metadata, OCaml interface files, Ogg Vorbis audio files, OpenOffice .odt files, OpenSSH public keys, OpenWRT package archives (.ipk), PDF documents, PE32 files, PGP signatures, PGP signed/encrypted messages, PNG images, PostScript documents, Public Key Cryptography Standards (PKCS) files (version #7), Python pyc files, RPM archives, Rust object files (.deflate), Sphinx inventory files, SQLite databases, SquashFS filesystems, symlinks, tape archives (.tar), tcpdump capture files (.pcap), text files, TrueType font files, U-Boot legacy image files, WebAssembly binary module, XML binary schemas (.xsb), XML files, XMLB files, XZ compressed files, ZIP archives and Zstandard compressed files.
  • Fallback on hexdump comparison, fuzzy-matching to handle renamings, and much more!
  • diffoscope example output

  • Example diffoscope output for https-everywhere 5.0.6 vs 5.0.7
  • https://reproducible-builds.org

    Reproducible Builds Summits

    • 2015 Athens
    • 2016 Berlin
    • 2017 Berlin
    • 2018 Paris
    • 2019 Marrakech
    • 2022 Venice
    • 2023 Hamburg
    • 2024 Hamburg
    • 2025 location needed!

    Projects at Reproducible Builds Summits

    Alpine Linux, Apache Maven, Apache Security, Arch Linux, baserock, Bazel, bootstrappable.org, Buildroot, CHAINS (KTH Royal Institute of Technology), coreboot, CoyIM, Debian, Eclipse Adoptium, EdgeBSD, ElectroBSD, F-Droid, Fedora, FreeBSD, GitHub, GNU Guix, GNU Mes, Google, Guardian Project, Homebrew, Huawei, Indiana University (IU), in-toto, IPFS, JustBuild, LEAP, LEDE, LibreOffice, Linux, MacPorts, Max Planck Institute for Security and Privacy (MPI-SP), Microsoft, MirageOS, Mobian, NetBSD, New York University (NYU), NixOS, Octez / Tezos, openSUSE, OpenWrt, pantsbuild.org, phosh, pkgsrc, privoxy, Project, Pure OS, Qubes OS, Quinel Ltd, rebuilderd, Red Hat, repeatr.io, riot-os.org, Rust, Software Freedom Conservancy, spytrap-adb, subuser.org, systemd, Tails, Tor Project, Ubuntu, University of Pennsylvania (UPenn) and Warpforge.

    (There were more but we were asked to only mention these.)

    Reproducible-builds.org funding

    • r-b.o is a Software Freedom Conservancy (SFC) project since 2018, currently funding Chris Lambs, Mattia Rizzolo, Vagrant Cascadian, myself & kpcyrd.
    • Funding needed to support our continous work: community work, fixing upstreams, developing software, designing processes, the yearly summit...
    • Thank you, CIP, OTF & STF & all past sponsors too ❤️

    Short summary of Reproducible Debian

    Reproducible Builds for some parts of Debian are a reality already today:

    • reproducible docker/podman images: docker.debian.net
    • reproducible live images: cdimage.debian.org
    • individual packages, useful for both developers and some users

    CI results Debian unstable, 20150131

    CI results for Debian unstable, 20241115

    CI results for Debian trixie, 20241115

    3919 reprodubility related bugs fixed (mostly upstreamed), 298 patches pending...

    37400 bugs in 11 years ~= 9 per day

    we rebuild constantly and find lots of FTBFS bugs

    snapshot.debian.org
    fixed in July 2024!

    🥳

    • Huge thanks to Linux Nordberg and DSA!
    • In the last 2(?) years many snapshots were not imported,
    • also access was severely throttled.
    • There are still some smaller issues but in general the service is finally reliable and usable again.

    Debian testing migration, soon we'll be getting real!

    • 2023: CI reproducible-builds results included in excuses output for Debian testing migration, but there is no penalty nor bonus yet.
    • July 2024: snapshot.debian.org got fixed and we can now do rebuilds where the build is compared against what we distribute on ftp.debian.org instead of CI builds.
    • September 2024: debootsnap and debrebuild (both from devscripts) fixed for good.
    • October 2024: work on https://reproduce.debian.net began.

    How to use debrebuild from src:devscripts in trixie

    • wget https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/2024/01/16/crun_1.13-1_amd64.buildinfo
    • debrebuild --builder=sbuild+unshare libaacs_0.11.1-3_amd64-source.buildinfo
    • voila!

    about rebuilderd

    • support for rebuilding Arch, Debian and Tails
    • rebuilderd, rebuilder-worker, rebuilderctl
    • several instances for Arch exist
    • written in Rust by kpcyrd
    • available at https://github.com/kpcyrd/rebuilderd
    • installation with sudo make install, soon with sudo apt install

    https://reproduce.debian.net

    • a rebuilderd instance
    • rebuilding and comparing against what we distribute on ftp.debian.org
    • setup still at its infancy

    https://gitlab.archlinux.org/archlinux/rebuilderd-website

    the difference between theory and practice?

    63% !

    (96% vs 33%)

    why is the difference so large currently?

    • many snapshots specified in .buildinfo files are missing, probably affecting 20% of the archive
    • snapshot.d.o still has some issues, incl. returning broken files, which then are cached...but DSA is working on it!
    • fakeroot not listed in .buildinfo files until recently (~20% as well?)
    • other reasons
    • in the last 24h ~60% of the rebuilds were reproducible
    • 6 weeks ago we were at 23% so 33% is very nice progress in short time 😉

    How to reach 100% in practice

    • 100% reproducible is a political decision and nothing technical.
    • We need to change debian-policy!
    • We can work around 'must-have-offenders' using whitelists in the beginning.
    • The goal is still 100%, whitelists are just a way to achieve that goal eventually.
    • Penalizing testing migration IMO is a means to enforce debian-policy though it can be done before it's policy.

    Debian policy

    • 2017: packages should build reproducibly.
    • 2025? reproducible packages must not regress.
    • 2025? NEW packages must build reproducibly.
    • 2027? packages must build reproducibly.
    • In practice the release team will probaby enforce this before it becomes policy. ☺️

    The path to 100%

    suitereproducibleunreproducible
    stretch 23040(93.2%) 1514
    buster 26653(93.9%) 1405
    bullseye 29698(96.2%) 761
    bookworm 33240(96.9%) 670
    trixie 35000 256
    forky 40000 128 (but no regressions or new pkgs)
    forky+1 45000 42 policy violations left
    forky+2 50000 0 (?!?!!! that's probably 2031)

    Theory vs Practice

    • In theory, we are done. In practice, we have shown that reproducible builds can be done in theory.
    • Now we need to close the gap between theory and practice.
    • For Debian we now also need to setup more rebuilderd instances, for all architectures (hardware and admins wanted!), and we need rebuilderd to deal with several architectures and then feed that data to britney.
    • And those missing 4-5% in CI are also crucial however, or at least 1% of them. For Debian, 1% means 300 source packages...

    Summary, looking forward

    • Many projects support or aim for reproducible builds today. This is a huge success.
    • Next: finish those last 1-5% upstream. (And there are some dragons too, eg PGO.)
    • Next: create rebuilderd infrastructure, processes, tools.
    • Also crucial: project-level consensus and commitment to reproducible builds in practice.

    Thank you
    … and all contributors out there!

    Any questions? 🤷

    Holger Levsen <holger@reproducible-builds.org>
    B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C

    22 unreproducible src packages out of the 1337 most popular

    rdma-core jpeg-xl graphviz qtwebengine-opensource-src pipewire tracker colord nss qtbase-opensource-src grub2 libu2f-host apg libdmapsharing libjcat bluez vlc coinor-cgl lynx underscore gegl ffmpeg bind9

    96 build-essential-depends unreproducible source packages (out of 5500)

    maven-shared-utils rdma-core rustc php8.2 jpeg-xl subversion systemtap graphviz r-base pipewire pandas camlp-streams statsmodels colord chromium qt6-declarative nss qtbase-opensource-src python3.12 python3.13 cdi-api gdb pupnp pacemaker scim gcc-13-cross qemu efl bsh cxxtest codenarc fop jsoup hevea libnative-platform-java javaparser black pstoedit lucene4.10 gmetrics emoslib combblas node-function-bind gdcm xmlbeans yarl nbsphinx groovy fltk1.3 ruby-pygments.rb libapache-poi-java ldc doxygen ghc lombok h2database freetds jsch jboss-jdeparser2 dejagnu jts python-nacl servlet-api jaxb ocaml-topkg odc qtremoteobjects-everywhere-src mpich lucene8 secilc bluez vlc valgrind linux86 golang-golang-x-net coinor-cgl parallel lynx underscore asmtools rocm-hipamd libcamera nbconvert mypy petsc node-d3 qtconnectivity-opensource-src eckit eccodes frozenlist extlib emacs ffmpeg bind9 meson-python ipyparallel

    image sources:


    https://mastodon.xyz/@mac_call/113448024657756100


    https://pouet.chapril.org/@infothema/113459042400198201