Collaborative Working Sessions - Defining our definitions

We feel that definitions of terms are important to coordinate our work across projects and to be able to communicate both our successes and the work still remaining to be done.

The current Reproducible Builds effort has two commonly cited definitions of “reproducible” – one mentioned in the reproducible-builds.org website, and another (shorter) one which is seen on the group’s teeshirts. But perhaps we need more; and perhaps it is time to revisit those and see if they still serve.

Consensus:

  • Definitions are important
  • We only have one (relatively) clear definition – “reproducible” – but maybe we need more definitions, or some concept of “levels”.
  • The definition we have is evidently not clear enough and may have other problems – evidenced by announcements made by various projects and distributions which recurrently report “X% reproducible”, wherein:
    • percentages do not appear to be meaningfully comparable across distros
    • the percentages reported by projects vary over time (when the exact definition changes to be more or less strict, or something not covered by their previous practical definition changes)
    • it appears that no systems are actually approaching “100%”.

Progress:

Producing new definitions proves difficult.

Brainstorming: potentially revelevant terms and concepts mentioned included:

  • diverse compilation
  • environmental randomization
  • insignificant environment bits
  • “once” “I” reproduced it (example of a weak definition that we often see used in practice!)
  • bit-for-bit reproducibility (included in current definition – we ratify that we still like this because it is specific and clear)
  • late-discovered un-reproducibility (an unavoidable phenomenon that causes percentages to backslide)
  • circumstantial reproducibility
  • idependent reproducibility
  • should we consider different Levels for Outcome Equality?
  • should we consider different Levels for Input Variation?
  • “only 100% reproducibility is useful” (several people agree with this, while acknolwedging the irony that no project has attained it)
  • deterministic
  • spurious vs tampering vs unreproducibile – degrees (and reasons) for unreproducibility events
  • “transparently reproducible” (vs “blackbox”?)
  • reliable reproducibility
  • several notes contain drafts of functions…
    • one contains “f(S)=B” – meaning: a function consumes source and produces a binary
    • a later note contains “f(S,SE,I)=As” – meaning: a function consumes source, source environment, (?unknown?), and produces Artifacts (perhaps multiple).
  • Draft of levels?
    • Level 0: unreproducible
    • Level 1: Build at least twice with matching initial conditions, on the same machine, by the same person
    • Level 2: Level 1 plus at least one build varying “X” things (“X” not specified)

Observations, following brainstorming:

  • As the discussion that oriented around function sketches continued, things started with one parameter, and then people tend to want to factorize out more and more parameters.
    • The distinguishing trait for what got factorized tended to be roughly “which things are difficult to change”.
  • Participants wanted to steer the world by changing the definitions – in two very different directions:
    • Some participants specifically identified wanting to make the definition more concrete in ways that would encourage readers to pick narrower, more attainable smaller steps towards the goal of reproducibility.
    • Other participants wished to make the definition as broad and aspirational as possible (for example immediately encouraging “diverse” compilation, instead of merely repeatable setup and verification of deterministic steps from identical setup conditions).
  • In this session, we were unable to immediately identify clear “levels”.
    • The general idea seems to be that higher levels of reproducibility should involve more variation injection…
    • … but there are many different potential specific axis for this,
    • … and there is no clear ordering in which the different classes of variation could be said to matter more than others (so, ordinal “levels” seem difficult to map to this).