Collaborative Working Sessions - Tools

Reproducible Builds Summit 2022

Tools conversation

  • request for diffoscope to support the “nar” format.
  • there is congratulations and thanks for diffoscope – for some people here it is the “main tool”! could not live without!
  • … also from the same person: “are there improvements you would like to see?” “yeah.”
  • diffoscope can be quite verbose in its reporting. it can become difficult to see a high-level result in large results.
    • perhaps it would be nice to have some human-readable explanation of some of the kinds of changes which might be recognizable?
      • (admitted that it is not clear how much this is possible)
      • for example can we guess if this pattern of change indicates that certain compiler flags have been used?
  • would live to have: a source code mirroring tool!
    • example user story: i have an openWRT from 15 years ago I want to rebuild… and though I have some instructions… the source URLs have maybe moved or disappeared, and this stops me.
      • is one example – openWRT is not special in this – source code is source code, we all need this.
    • is the goal of the Software Heritage Foundation (SWH)?
      • Possibly!
      • Do we trust them? should there be more decentralization of this?
        • want our own instance!
    • anecdote: we know some projects stop hosting their own source releases if they think they have a security vuln in it.
      • entire table agrees: this is very very bad behavior. archives should be kept!!
      • “but surely it’s still in their VCS”
        • unclear!
        • some projects produce “source” tarballs… but these may have e.g. autoconf has been run… and maybe that ISN’T in the VCS. uh roh.
    • anecdote: git archive behavior changes.
    • discussed that want this to work where you ask the service for the source snapshot that is identified by a hash… this means it is easier to mirror, and does not require much trust.
    • for example deb source files solve this… for debian.
      • this also isn’t necessarily archival and total. doesn’t necessarily contain older releases!
      • emphasized that this solution is for debian and doesn’t solve it for others.
    • nixOS and guix “substitutes servers” may also contain this kind of snapshotted content.
    • part of the problem is mirroring the content… part of the problem is seeing the names used to refer to stuff, and then snapshotting that as well.
    • this is kinda like transparency logs! like Certificate Transparency!
      • want this like e.g. maven-central-transparency-logs !
    • tor has some scripts that look at maven-central and the pom files and hashes them, and they do redistribute the hashes.
      • they download the thing the first time, and recall the hash…
        • the future build scripts will check the hash.
          • (somewhat complex: is either a hash is checked, or a sig is checked.)
            • (note that the signature files – the “.asc” files – are not distributed right now. maybe they should be.)
        • the hash is stored in the tor git repo after first being seen.
        • the future build scripts are still downloading.
      • “can I have that?!” “we do not the cleverest thing. we would like to improve that.”
        • “very simple txt files. only the things we care about.”
    • arch linux stores hashes in their build scripts like this sometimes as well, we think!
    • bazel has some fetching functions that takes a hash as an optional parameter next to a URL.
      • it’s a shame it’s optional
      • the UX of this is… you have to find a hash somewhere and copy-paste it.
        • where?
        • this is human manual!
    • in cases where git is used, sometimes simply using the git source hash seems like a good option.
      • tor notes that they use these often, when possible.
    • it seems many people have some scripts for doing this, but no one is proud of them enough to share :)
  • possible that there are two parts of the mirroring problem…
    • to have and to mirror the content blobs is one task
    • to see the names that are used for this content is also a problem.
  • what if there was a foundation that maintained version names and made sure people don’t re-release things?
  • what about tools for rebuilding?
    • some people are not sure it’s really seeming possible to share at this level. many opinions.
    • we notice reprobuilder tools for debian were topics in previous years but did not really show up on the topic board at all today…
  • could we have some linters for things like --Werror=datetime ?
    • something like this example above already exists we think, but we would like more like this!
  • reproducible builds on windows seem to have less representation in this table
    • some people here have done it!
    • not much documentation is shared. potential room for improvement!
      • “black magic”, “sparse information on websites”
    • e.g. there are some checksum numbers in PE executable format in windows that needs to be normalized, and this is only described in some blog posts.
    • going on the reproducible builds website was some help!
  • discussion about difference between controlling supply of content to build vs controlling and describing build environment
    • “do you have buildinfo files”
  • it is interesting that gcc may be introducing new sections in the ELF header which describe the build environment.
    • this could be terrible?! or it could make some parts of life easier.
      • depends on what information ends up being stored there, exactly!
    • if they embed library version info, we welcome that!
    • if they embed “debian vs redhat”: “who cares”
    • if they embed kernel version: please do not!
    • if they embed timestamps: please do not!!!!!
  • “debuginfod” is a thing a few years ago which puts debug info in a server/service
    • part of an evolution of systems for many years now which puts less debug info in executable binaries themselves. (e.g. debug symbols began to be put in separate files, starting a few years ago, or at least it is possible to do so.)
  • reprobuilder / reprotest: have you heard of it? what would you want from it?
    • some have not heard of it. some have.
    • does it pay off?
      • it has some features to make it easy to use with debian, but it’s supposed to be generic
      • you have a “exec this” option as well, which should be general.
        • argument that this isn’t very useful, because by the time i’ve manifested a whole system to hand to that, i’ve done a bigger piece of work than reprobuilder is.
    • it has a nice suite of things it will aggressively vary!
      • that’s nice!
      • sometimes. some of the things it can vary, some people do not care about.
        • example: hostname. several people state they have no complaint to just hardcode a hostname, and don’t care.
        • example: timestamps: these already vary quite naturally so usually it’s not the biggest need.
        • you can disable these if you don’t want to waste time on them if you don’t see them as interesting, but arguably also it is still a complexity that someone would maintain or know about but not use.
  • misc things
    • tor has ended up needing faketime for something on macOS
  • signing is a problem for reproducibility sometimes, especially when embedded in bundles.
    • the fdroid people have worked on some scripts for taking signatures out and putting them back in, in android packages!
    • sometimes a “published private key” is used.
  • it would be nice if we could convince more things to store no time info instead of SOURCE_DATE_EPOCH
    • the spec does say already that you should only use SOURCE_DATE_EPOCH as a last resort… but. well.
    • idea: should we perhaps amend the spec to say “if SOURCE_DATE_EPOCH=10000” (or some random number we select), “then that means please store no time at all”.
      • same wishes as already, but nudge people more explicitly to support this.
      • would make it visible if a tool listens to SOURCE_DATE_EPOCH but shouldn’t.

Follow us on Twitter @ReproBuilds, Mastodon @reproducible_builds@fosstodon.org & Reddit and please consider making a donation. • Content licensed under CC BY-SA 4.0, style licensed under MIT. Templates and styles based on the Tor Styleguide. Logos and trademarks belong to their respective owners. • Patches for this website welcome via our Git repository (instructions) or via our mailing list. • Full contact info