Bootstrapping I

Website structure

(short summary from the problem - 2-3 concise sentences based on the Problems section?) (try to clarify what bootstrapping actually is)

To have trust in our computing platforms, we should be able to follow the bootstrapping process - how each part was produced from source - to then feel confident it is built on good foundations.

(more detail on the intended outcomes and benefits)

(1. trust/security - most powerful/appealing motivation, mention this one first)

We want to draw attention to the need for an auditable, repeatable process for bootstrapping programming languages, compilers, pieces of the toolchain and whole distributions.

(2. easier porting (new platforms? languages?) - secondary benefit, important but less people are interested)

Another benefit would be that it becomes easier to port these things to new hardware platforms.

(Motivation / benefits could become a separate section of it gets too big)

Compilers are often written in the language they are compiling. This creates a chicken-and-egg problem that leads users and distributors to rely on opaque, pre-built binaries of those compilers that they use to build newer versions of the compiler. We believe that those opaque binaries are a threat to user security and user freedom since they are not auditable; we believe the amount of bootstrap binaries should be minimized.

If you’re working on a compiler that is written in a language other than the one it’s compiling, you’re all set!

If your compiler is written in the language that it’s compiling (“self-hosted”), it probably falls in one of the following categories.

If other implementations of this programming language exist, please make sure your compiler can be built with one of these. Examples include:

If your compiler targets a language for which no other implementation exists, then please consider maintaining a (minimal) implementation of the language written in a different language. Most likely this implementation exists, or existed at the point the programming language was created. Maintaining this alternate implementation has a cost; however, this cost should be minimal if this alternate implementation is used routinely to build the compiler, and if this implementation is kept simple—it does not need to be optimized.

Examples include:

Please let us know if you’d like to add your compiler to this list!

Build systems sometimes have chicken-and-egg problems: they may need a version of themselves to get built. If you are developing a build system, this can be avoided. We recommend that you provide an alternative way to build your build system.

Examples include:

Build system, compared to compiler, do not need to write a full language compiler of its language to bootstrap. A really slow and unefficient build written in shell script or another older build system (Ant, GNU Make) can generate a minimal version of the build system to bootstrap a complete version of it.

It is unavoidable that distributions use some binaries as part of their bootstrap chain. However, distributions should endeavour to provide traceacibility and automated reproducibility for such binaries. This means that:

For example, a distribution might use a binary package of GCC to build GCC from source. This bootstrap binary is in most cases built from a previous revision of the distribution’s GCC package. Thus, the distribution can label the binary with something like “this package was built by running on revision of the distribution's package repository." A user can then easily reproduce the binary by fetching the specified sources and running the specified command. This build will in most cases depend on a previous generation of bootstrap binaries. Thus, we get a chain of verifiable bootstrap binaries stretching back in time.

Bootstrap binaries may also come from upstream. This would typically be the case when a language is first added to a distribution. In this case, it may not be obvious how the binary can be reproduced, but the distribution should at least clearly label the provenance of the binary, e.g. “this binary was downloaded from https://upstream-compiler.example.org/upstream-compiler-20161211-x86_64-linux.tar.xz”.

TODO: provide an example of how we do this / are going to do this in Nixpkgs / Guix / …?

http://git.savannah.gnu.org/cgit/guix.git/commit/?id=062134985802d85066418f6ee2f327122166a567

Until recently the latest Java Development Kit (JDK) could be bootstrapped in a chain starting with GCJ (the GNU Compiler for Java) and the IcedTea build system. GCJ was deleted from the GNU Compiler Collection in October 2016, so it is now unclear how to bootstrap the JDK in future. To ensure that the JDK can be built from sources without the need for an existing installation of the OpenJDK we propose to continue maintaining GCJ.

The C and C++ compilers of the GNU Compiler Collection make up the foundation of many free software distributions. Current versions of GCC are written in C++, which means that a C++ compiler is needed to build it from source. GCC 4.7 was the last version of the collection that could be built with a plain C compiler, a much simpler task. We propose to collectively maintain a subset of GCC 4.7 to ensure that we can build the foundation of free software distributions starting with a simple C compiler (such as tinyCC, pcc, etc).

This is nice, but what are the actual benefits of “bootstrappable” implementations?

As a user, bootstrappable implementations, together with https://reproducible-builds.org provide confidence that you are running the code you expect to be running. Its source code is auditable by the developer community, which in turns provides reassurance that the code you’re running does not have backdoors.

Bootstrappable implementations provide clear provenance tracking: the dependency graph of your distribution packages shows how each binary was obtained.

If you are a compiler writer, making your compiler bootstrappable from a different language will simplify the development process (no need to carry large pre-built binaries around). It will also make it easier to port the compiler to a different platform for which no bootstrap binaries exist yet.

Try building gcc using gcc-4.7 <– this already works (we used GCC 4.7 some months ago in Guix, but updated later for unrelated reasons) Try building GCC 4.7 with TinyCC