The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do.
This is the sixth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project.
Today, however, we will be talking with David A. Wheeler, the Director of Open Source Supply Chain Security at the Linux Foundation.
Holger Levsen: Welcome, David, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself?
David: Sure! I’m David A. Wheeler and I work for the Linux Foundation as the Director of Open Source Supply Chain Security. That just means that my job is to help open source software projects improve their security, including its development, build, distribution, and incorporation in larger works, all the way out to its eventual use by end-users. In my copious free time I also teach at George Mason University (GMU); in particular, I teach a graduate course on how to design and implement secure software.
My background is technical. I have a Bachelor’s in Electronics Engineering, a Master’s in Computer Science and a PhD in Information Technology.
My PhD dissertation is connected to reproducible builds. My PhD dissertation was on countering the ‘Trusting Trust’ attack, an attack that subverts fundamental build system tools such as compilers. The attack was discovered by Karger & Schell in the 1970s, and later demonstrated & popularized by Ken Thompson. In my dissertation on ‘trusting trust’ I showed that a process called ‘Diverse Double-Compiling’ (DDC) could detect trusting trust attacks. That process is a specialized kind of reproducible build specifically designed to detect trusting trust style attacks. In addition, countering the trusting trust attack primarily becomes more important only when reproducible builds become more common. Reproducible builds enable detection of build-time subversions. Most attackers wouldn’t bother with a trusting trust attack if they could just directly use a build-time subversion of the software they actually want to subvert.
Holger: Thanks for taking the time to introduce yourself to us. What do you think are the biggest challenges today in computing?
There are many big challenges in computing today. For example:
- Lack of resilience & capacity in chip fabrication. Fabs are extraordinarily expensive, and at the high end continue to have technological advancement. As a result, supply is failing to meet demand, and geopolitical issues raise further concerns. We’ve seen cars, gaming consoles and many other devices unable to be delivered due to chip shortages. More fabs are being built, and some politicians are raising concerns, but it’s unclear that current efforts will be enough.
- Lack of enough developers able to develop the software that people & organizations need. Computers are far faster, and open source software has made software reuse incredibly easy. However, organizations still struggle to automate many tasks. The bottleneck is the lack of enough talented developers able to convert ideas into working software. ‘Low-code’ and ‘no-code’ approaches help in specialized areas, just like all previous ‘automate the programmer’ efforts of the last 60 years, but there’s no reason to believe they will help enough.
- Large scale of software. Small systems are easier to develop & maintain, but today’s systems increasingly get bigger to meet users’ needs & are much harder to manage. Even small embedded systems are often supported by huge back-end systems.
- Ending tail of Moore’s law & rise of smartphones. Historically people would just wait a few years for their software to speed up, but Moore’s law is petering out, and smartphones are necessarily limited by power & size limits. As a result, software developers can’t wait for the hardware to save their slow systems; they must redesign. Switching to faster languages, or using multiple processors, is much more difficult than waiting for performance problems to disappear.
- Continuous change in interfaces. Developers continuously find reasons to change component interfaces: perhaps they’re too inflexible, too hard to use, and so on. But now that developers are reusing hundreds, thousands, or tens of thousands of components, managing the continuous change of the reused components is challenging. Package managers make updating easy — but don’t automatically handle interface changes. I think this is mostly a self-inflicted problem — most components could support old interfaces (like the Linux kernel does) — but because it’s often not acknowledged as a problem, it’s often not addressed.
- Security & privacy. Decades ago there were fewer computers and most computers weren’t connected to a network. Today things are different. Criminals have found many ways to attack computer systems to make money, and nation-states have found many ways to attack computer systems for their own reasons. Attackers now have very strong motivations to perform attacks. Yet many developers aren’t told how to develop software that resists attacks, nor how to protect their supply chains. Operations try to monitor and recover from attacks, but their job is difficult due to inadequately secure software that doesn’t support those monitoring & recovery efforts well either. The results are terrible security.
Holger: Do you think reproducible builds are an important part in secure computing today already?
David: Yes, but first let’s put things in context.
Today, when attackers exploit software vulnerabilities, they’re primarily exploiting unintentional vulnerabilities that were created by the software developers. There are a lot of efforts to counter this:
- Train & education developers in how to develop secure software. The OpenSSF provides a free course on how to do that (full disclosure: I’m the author). Take that course or something like it!
- Add tools to your CI pipeline to detect potential vulnerabilities. Yes, they have false positives and false negatives, so you have to also use your brain… but that just means you need to be smart about using tools, instead of not using them.
- Get projects & organizations to update the components they use, since often the vulnerabilities are well-known publicly (e.g., Equifax in 2017). Add some tools to your development process to warn you about components with known vulnerabilities! GitHub & GitLab both provide tools to do this, and there are many other tools.
- When starting new projects, try to use memory-safe languages. On average 70% of the vulnerabilities in Chrome and in Microsoft are from memory safety problems; using a memory-safe language eliminates most of them.
We’re just starting to get better at this, which is good. However, attackers always try to attack the easiest target. As our deployed software has started to be hardened against attack, attackers have dramatically increased their attacks on the software supply chain (Sonatype found in 2022 that there’s been a 742% increase year-over-year).
The software supply chain hasn’t historically gotten much attention, making it the easy target.
There are simple supply chain attacks with simple solutions:
- In almost every year the top attack has been typosquatting. In typo squatting, an attacker creates packages with almost the right name. This is an easy attack to counter — developers just need to double-check the name of a package before adding it. But we aren’t warning developers enough about it! For more information, see papers such as the Backstabber’s Knife Collection.
- Last year the top software supply chain attack was ‘dependency confusion’ — convincing projects to use the wrong repo for a given package. There are simple solutions to this, such as specifying the package source and/or requiring a cryptographic hash to match.
- Some attacks involve takeovers of developer accounts. In almost all cases, these are caused by stolen passwords. Using a multi-factor authentication (MFA) token eliminates stolen password attacks, which is why several repositories are starting to require MFA tokens in some cases.
Unfortunately, attackers know there are other lines of attack. One of the most dangerous is subverted build systems, as demonstrated by the subversion of SolarWinds’ Orion system. In a subverted build system, developers can review the software source code all day and see no problem, because there is no problem there. Instead, the process to convert source code into the code people run, called the ‘build system’, is subverted by an attacker.
One solution for countering subverted build systems is to make the build systems harder to attack. That’s a good thing to do, but you can never be confident that it was ‘good enough’. How can you be sure it’s not subverted, if there’s no way to know?
A stronger defense against subverted build systems is the idea of verified reproducible builds. A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts. A build is verified if multiple different parties verify that they get the same result for that situation. When you have a verified reproducible build, either all the parties colluded (and you could always double-check it yourself), or the build process isn’t subverted.
There is one last turtle: What if the build system tools or machines are subverted themselves? This is not a common attack today, but it’s important to know if we can address them when the time comes. The good news is that we can address this. For some situations reproducible builds can also counter such attacks. If there’s a loop (that is, a compiler is used to generate itself), that’s called the ‘trusting trust’ attack, and that is more challenging. Thankfully, the ‘trusting trust’ attack has been known about for decades and there are known solutions. The ‘diverse double-compiling’ (DDC) process that I explained in my PhD dissertation, as well as the ‘bootstrappable builds’ process, can both counter trusting trust attacks in the software space. So there is no reason to lose hope: there is a ‘bottom turtle’, as it were.
Holger: Thankfully, this has all slowly started to change and supply chain issues are now widely discussed, as evident by efforts like Securing the Software Supply Chain: Recommended Practices Guide for Developers which you shared on our mailing list. In there, Reproducible Builds are mentioned as recommended advanced practice, which is both pretty cool (we’ve come a long way!), but to me it also sounds like this will take another decade until it’s become standard normal procedure. Do you agree on that timeline?
David: I don’t think there will be any particular timeframe. Different projects and ecosystems will move at different speeds. I wouldn’t be surprised if it took a decade or so for them to become relatively common — there are good reasons for that.
Today the most common kinds of attacks based on software vulnerabilities still involve unintentional vulnerabilities in operational systems. Attackers are starting to apply supply chain attacks, but the top such attacks today are typosquatting (creating packages with similar names) and dependency confusion) (convincing projects to download packages from the wrong repositories).
Reproducible builds don’t counter those kinds of attacks, they counter subverted builds. It’s important to eventually have verified reproducible builds, but understandably other issues are currently getting prioritized first.
That said, reproducible builds are important long term. Many people are working on countering unintentional vulnerabilities and the most common kinds of supply chain attacks. As these other threats are countered, attackers will increasingly target build systems. Attackers always go for the weakest link. We will eventually need verified reproducible builds in many situations, and it’ll take a while to get build systems able to widely perform reproducible builds, so we need to start that work now. That’s true for anything where you know you’ll need it but it will take a long time to get ready — you need to start now.
Holger: What are your suggestions to accelerate adoption?
David: Reproducible builds need to be:
- Easy (ideally automatic). Tools need to be modified so that reproducible builds are the default or at least easier to do.
- Transparent to projects & potential users. Many projects have no idea that their results aren’t reproducible, and many potential users of the project don’t know either. That information needs to be obvious. I’ve proposed that the OpenSSF Dashboard SIG try to reproduce builds, for at least some packages, to make it more obvious to everyone when a project isn’t reproducible. I don’t know if that will happen in that particular case, but the point is to help people learn that information as soon as possible.
- Deployed. Experiments are great, but experiments showing that a project could be reproducible are inadequate. We need the projects that people use to be reproducible.
I think there’s a snowball effect. Once many projects’ packages are reproducible, it will be easier to convince other projects to make their packages reproducible.
I also think there should be some prioritization. If a package is in wide use (e.g., part of minimum set of packages for a widely-used Linux distribution or framework), its reproducibility should be a special focus. If a package is vital for supporting some societally important critical infrastructure (e.g., running dams), it should also be considered important. You can then work on the ones that are less important over time.
Holger: How is the Best Practices Badge going? How many projects are participating and how many are missing?
David: It’s going very well. You can see some automatically-generated statistics, showing we have over 5,000 projects, adding more than 1/day on average. We have more than 900 projects that have earned at least the ‘passing’ badge level.
Holger: How many of the projects participating in the Best Practices badge engaging with reproducible builds?
David: As of this writing there are 168 projects that report meeting the reproducible builds criterion. That’s a relatively small percentage of projects. However, note that this criterion (labelled build_reproducible) is only required for the ‘gold’ badge. It’s not required for the passing or silver level badge.
Currently we’ve been strategically focused on getting projects to at least earn a passing badge, and less on earning silver or gold badges. We would love for all projects to get earn a silver or gold badge, of course, but our theory is that projects that can’t even earn a passing badge present the most risk to their users.
That said, there are some projects we especially want to see implementing higher badge levels. Those include projects that are very widely used, so that vulnerabilities in them can impact many systems. Examples of such projects include the Linux kernel and curl. In addition, some projects are used within systems where it’s important to society that they not have serious security vulnerabilities. Examples include projects used by chemical manufacturers, financial systems and weapons. We definitely encourage any of those kinds of projects to earn higher badge levels.
Holger: Many thanks for this interview, David, and for all of your work at the Linux Foundation and elsewhere!
For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing firstname.lastname@example.org.