Collaborative Working Sessions - Tools
Reproducible Builds Summit 2022
Tools conversation
- request for diffoscope to support the “nar” format.
- there is congratulations and thanks for diffoscope – for some people here it is the “main tool”! could not live without!
- … also from the same person: “are there improvements you would like to see?” “yeah.”
- diffoscope can be quite verbose in its reporting. it can become difficult to see a high-level result in large results.
- perhaps it would be nice to have some human-readable explanation of some of the kinds of changes which might be recognizable?
- (admitted that it is not clear how much this is possible)
- for example can we guess if this pattern of change indicates that certain compiler flags have been used?
- perhaps it would be nice to have some human-readable explanation of some of the kinds of changes which might be recognizable?
- would live to have: a source code mirroring tool!
- example user story: i have an openWRT from 15 years ago I want to rebuild… and though I have some instructions… the source URLs have maybe moved or disappeared, and this stops me.
- is one example – openWRT is not special in this – source code is source code, we all need this.
- is the goal of the Software Heritage Foundation (SWH)?
- Possibly!
- Do we trust them? should there be more decentralization of this?
- want our own instance!
- anecdote: we know some projects stop hosting their own source releases if they think they have a security vuln in it.
- entire table agrees: this is very very bad behavior. archives should be kept!!
- “but surely it’s still in their VCS”
- unclear!
- some projects produce “source” tarballs… but these may have e.g. autoconf has been run… and maybe that ISN’T in the VCS. uh roh.
- anecdote: git archive behavior changes.
- discussed that want this to work where you ask the service for the source snapshot that is identified by a hash… this means it is easier to mirror, and does not require much trust.
- for example deb source files solve this… for debian.
- this also isn’t necessarily archival and total. doesn’t necessarily contain older releases!
- emphasized that this solution is for debian and doesn’t solve it for others.
- nixOS and guix “substitutes servers” may also contain this kind of snapshotted content.
- part of the problem is mirroring the content… part of the problem is seeing the names used to refer to stuff, and then snapshotting that as well.
- this is kinda like transparency logs! like Certificate Transparency!
- want this like e.g. maven-central-transparency-logs !
- tor has some scripts that look at maven-central and the pom files and hashes them, and they do redistribute the hashes.
- they download the thing the first time, and recall the hash…
- the future build scripts will check the hash.
- (somewhat complex: is either a hash is checked, or a sig is checked.)
- (note that the signature files – the “.asc” files – are not distributed right now. maybe they should be.)
- (somewhat complex: is either a hash is checked, or a sig is checked.)
- the hash is stored in the tor git repo after first being seen.
- the future build scripts are still downloading.
- the future build scripts will check the hash.
- “can I have that?!” “we do not the cleverest thing. we would like to improve that.”
- “very simple txt files. only the things we care about.”
- they download the thing the first time, and recall the hash…
- arch linux stores hashes in their build scripts like this sometimes as well, we think!
- bazel has some fetching functions that takes a hash as an optional parameter next to a URL.
- it’s a shame it’s optional
- the UX of this is… you have to find a hash somewhere and copy-paste it.
- where?
- this is human manual!
- in cases where git is used, sometimes simply using the git source hash seems like a good option.
- tor notes that they use these often, when possible.
- it seems many people have some scripts for doing this, but no one is proud of them enough to share :)
- example user story: i have an openWRT from 15 years ago I want to rebuild… and though I have some instructions… the source URLs have maybe moved or disappeared, and this stops me.
- possible that there are two parts of the mirroring problem…
- to have and to mirror the content blobs is one task
- to see the names that are used for this content is also a problem.
- what if there was a foundation that maintained version names and made sure people don’t re-release things?
- what about tools for rebuilding?
- some people are not sure it’s really seeming possible to share at this level. many opinions.
- we notice reprobuilder tools for debian were topics in previous years but did not really show up on the topic board at all today…
- could we have some linters for things like
--Werror=datetime
?- something like this example above already exists we think, but we would like more like this!
- reproducible builds on windows seem to have less representation in this table
- some people here have done it!
- not much documentation is shared. potential room for improvement!
- “black magic”, “sparse information on websites”
- e.g. there are some checksum numbers in PE executable format in windows that needs to be normalized, and this is only described in some blog posts.
- going on the reproducible builds website was some help!
- discussion about difference between controlling supply of content to build vs controlling and describing build environment
- “do you have buildinfo files”
- it is interesting that gcc may be introducing new sections in the ELF header which describe the build environment.
- this could be terrible?! or it could make some parts of life easier.
- depends on what information ends up being stored there, exactly!
- if they embed library version info, we welcome that!
- if they embed “debian vs redhat”: “who cares”
- if they embed kernel version: please do not!
- if they embed timestamps: please do not!!!!!
- this could be terrible?! or it could make some parts of life easier.
- “debuginfod” is a thing a few years ago which puts debug info in a server/service
- part of an evolution of systems for many years now which puts less debug info in executable binaries themselves. (e.g. debug symbols began to be put in separate files, starting a few years ago, or at least it is possible to do so.)
- reprobuilder / reprotest: have you heard of it? what would you want from it?
- some have not heard of it. some have.
- does it pay off?
- it has some features to make it easy to use with debian, but it’s supposed to be generic
- you have a “exec this” option as well, which should be general.
- argument that this isn’t very useful, because by the time i’ve manifested a whole system to hand to that, i’ve done a bigger piece of work than reprobuilder is.
- it has a nice suite of things it will aggressively vary!
- that’s nice!
- sometimes. some of the things it can vary, some people do not care about.
- example: hostname. several people state they have no complaint to just hardcode a hostname, and don’t care.
- example: timestamps: these already vary quite naturally so usually it’s not the biggest need.
- you can disable these if you don’t want to waste time on them if you don’t see them as interesting, but arguably also it is still a complexity that someone would maintain or know about but not use.
- misc things
- tor has ended up needing faketime for something on macOS
- signing is a problem for reproducibility sometimes, especially when embedded in bundles.
- the fdroid people have worked on some scripts for taking signatures out and putting them back in, in android packages!
- sometimes a “published private key” is used.
- it would be nice if we could convince more things to store no time info instead of SOURCE_DATE_EPOCH
- the spec does say already that you should only use SOURCE_DATE_EPOCH as a last resort… but. well.
- idea: should we perhaps amend the spec to say “if SOURCE_DATE_EPOCH=10000” (or some random number we select), “then that means please store no time at all”.
- same wishes as already, but nudge people more explicitly to support this.
- would make it visible if a tool listens to SOURCE_DATE_EPOCH but shouldn’t.