Distribution/compiler bootstrapping
45-minute + 10 minute session on day 1
Problem: distributions use binaries from earlier releases to build their sources. This requires trust in existing binaries (cf. Thompson attack).
Toolchain bootstrap status
Nix, Guix: Toolchain provided by a binary archive containing GCC to bootstrap. Source versions noted, but not even automated.
Guix: Has Scheme implementations of various tools (wget
replacements etc.) required for bootstrapping to not have to rely on binaries. How feasible, scalable is that?
Debian: Builds everything with maximum number of features, thus maximum number of build dependencies. The dependency graph has cycles of size 800, including stuff like Firefox even. So called build profiles (similar to USE flags in Gentoo) reduces this a lot; this is work in progress. Then we should be down to 120.
Debian’s approach to bootstrapping: Not a single bootstrapping script, but rather sufficient meta-data in packages to allow bootstrapping, rebootstrap project led by Helmut Grohne. Starting point is an existing minimal Debian, and then starts building a compiler from there. Eventual goal: Diverse Double Compilation of a whole distribution. This does not require varying compiler versions, but rather similar (or same) compiler versions with (very) different bootstrapping history, e.g. Fedora vs. Debian.
Diverse Double Compilation
Diverse Double Compilation can detect backdoors, unless they are in all used compilers.
Problems:
- There are not many compilers to choose from (C++: only two free ones). Is it acceptable to use non-free ones here?
- Compiler source is not the only input (binutils, libc etc.)
Task: Needs a definition of the features required from a minimal base compilation system (shells, Make, etc.), so that we can bootstrap one distribution on another.
Compilers
Other compilers, like GHC, rust, MIT Scheme: Requires specific recent versions of itself.
Task: What is the most recent version of GCC that can be built with another C compiler?
Task: What is the most recent version of GHC that can be built with another compiler, e.g. Hugs (which would be implemented in C)?
One way out: GCC frontend, e.g. for Rust.
Better examples:
- Guile: Comes with a C interpreter.
- JDK (not much of a problem, can be bootstrapped with GCJ)
Wish: Compiler implementors would provide a easily executable rewrite-semantics for their languages, for bootstrapping. But likely far too much effort. One selling point would be that it also helps bootstrapping on new architectures.
Certification for compilers: is there a way to certify compilers? Can we create a standard procedure for bootstrapping a compiler and providing a hash of a known good GCC 4.8 (+ libc
, Make) binary, for example?
Should we try to encourage languages:
- To have two diverse compilers?
- To have a compiler written in C (which can then be verified)?
- To have an interpreter written in another language? (This may be easier than a compiler)
- To create a compiler ring (A compiles B compiles C compiles A)?
Task: Investigate whether NetBSD can be fully diverse double-compiled.
Summary
Overall question: What are the benefits from complete bootstrapping from scratch? Anything besides the Thompson attack? Less binaries to trust! Cross compilation again, more important now.