Collaborative Working Sessions - Packaging

Reproducible Builds Summit 2022

Day 1:

Points to work on:

  • Minimal input for reproducible builds env + instructions (should be as small as possible)
  • The metadata information to be provided to the user to ensure that the build is reproducibile
  • A handfull of ecosystems should be choosen to work on

The package managers that would be taken into account: Pip, Stack, Kabal, Debian, FreeBSD, OpenBSD, rpm, Guix, Cargo, Go modules, Gradle, Npm, Docker, OCI, Maven, opam (missing in the group: Ruby Gems, Composer, Nuget)

What reproducible would mean in comparison between binaries or just “text” like source code (JavaScript, Python or any other type of intrepretable): what is distributed? => there is probably a classification between package managers to be found

Once you have a reproducible candidate (release manager thinks his package should be rebproducible), you need to have another actor that should try to verify the reproducibility by rebuilding and checking against the published reference: only when someone can get the same output is the release marked as “reproducible verified/confirmed”.

Language specific packagers:

  • Pip
  • Stack
  • Kabal
  • Pip
  • Debian, FreeBSD, OpenBSD (Non-language-specific ones)
  • Cargo, Go modules
  • Gradle
  • Npm
  • Docker
  • Maven
  • Nuget
  • Composer

Day 2:

Starting actions:

  • What does reproducible mean?
  • Split them into buckets, what are the particularities
  • What is the metadata we need
  • What is there minimal input(build.spec) and their output (build.info)

Artefact:

  • build.spec - everything that needs to be input for building. Reproducibility will be checked against reference artefact.
  • build.info - what actually happens. It can contain the hashes of the parts and the hash of everything.

In case of Debian, the build.info file provides all the information and can be used as input for debrepro.

From the perspective of the reproducibility we can have the following when it comes to languages:

Reproducibility classes:

  • A - everything is reproducible
  • B - everything code is reproducible (functional)
  • C - everything in the programming language is reproducible, except the native dependencies
  • TBD - everything in development is reproducible, dependencies might not be reproducible
  • TBD - naming
  • TBD Reproducibility stamps from different organisations.

Practical tests:

  • Herve tried Python to see how reproducible it is: https://github.com/jvm-repo-rebuild/repro-summit
  • Matt hacked on NPM starting with:
    • Npm version
    • Node version
    • Git tag
    • Git repo

In Debian’s case everything that is released is built from scratch, including Java packages for instance. The minimal subset of inputs required to be able to generate reproducibile builds is the set of dependencies and some environment variables. The assumption is that the dependecies are safe.