Buy-in
Working on reproducible builds might look like a lot of effort with little gain at first. While this applies to many types of work related to security, there are already some good arguments and testimonies on why reproducible builds matter.
Resisting attacks
In March 2015, The Intercept published from the Snowden leaks the abstract of a talk at an internal CIA conference in 2012 about Strawhorse: Attacking the MacOS and iOS Software Development Kit. The abstract clearly explains how unnamed researchers have been creating a modified version of XCode that would — without any knowledge of the developer — watermark or insert spyware in the compiled applications.
A few months later, a malware dubbed “XcodeGhost” has been found targeting developers to make them unknowingly distribute malware embedded in iOS applications. Palo Alto Networks describes it as:
XcodeGhost is the first compiler malware in OS X. Its malicious code is located in a Mach-O object file that was repackaged into some versions of Xcode installers. These malicious installers were then uploaded to Baidu’s cloud file sharing service for use by Chinese iOS/OS X developers
The purpose of reproducible builds is exactly to resist such attacks. Recompiling these applications with a clean compiler would have made the problem easily visible, especially given the size of the added payload.
As Mike Perry and Seth Schoen explained in December 2014 during a talk at 31C3, problematic changes might be more subtle, and a single bit might be the only thing required to create a remotely exploitable security hole. Seth Schoen also demonstrated a kernel-level malware that would compromise the source code while it is read by the compiler, without leaving any traces on disk. While to the best of our knowledge such attacks have not been observed in the wild, reproducible builds are the only way to detect them early.
Quality assurance
Regular tests are required to make sure that the software can be built reproducibly in various environments. Debian and other free software distributions require that their users must be able to build the software they distribute. Such regular tests help in avoiding fail to build from source bugs and can uncover rare build problems such as timing problems, race conditions, or builds affected by locale.
Build environments may evolve after a project is no longer receiving major developments. While working on Debian, several high impact but hard to detect bugs were identified by testing builds in varying environments. To give some examples: a library had a different application binary interface for every build, garbled strings due to encoding mismatch, missing translations, or changing dependencies.
The constraint of having to reflect about the build environment also helps developers to think about the relationship with external software or data providers. Relying on external sources with no backup plans might cause serious troubles in the long term.
Reproducible builds also enable the recreation of matching debug symbols for a distributed build which can help understanding issues in software used in production.
Smaller Binary Differences
Having reproducible builds means that only changes in source code or build environment (such as the compiler version) will lead to differences in the generated binaries. This minimizes the changes in artifacts which reduces storage requirements and network traffic for delta updates.
With similar artifacts, testing can focus on parts that changed while still preserving confidence about unchanged code. This can speed up quality assurance and development speed.
Changes to the build system can be tested easily with reproducible builds: If the output artifacts are identical, the changes will not affect runtime behavior.
Increased Development Speed
Dependent packages do not need to be rebuilt and dependent tasks do not need to be rerun if a rebuild of a package does not yield different results. This can significantly reduce build times and lead to faster development speeds and lower cost.
Build speeds can also be improved by showing that cross-compilation produces the same result as native compilation and then doing the majority of builds with cross-compilation on faster machines.
“But how can I trust my compiler?”
A common question related to reproducible builds is how is it possible to know if the build environment is not compromised if everyone is using the same binaries? Or how can I trust that the compiler I just built was not compromised by a backdoor in the compiler I used to build it?
The latter is known in the academic literature since the Reflections on trusting trust paper from Ken Thompson published in 1984. It’s the paper mentioned in the description of the talk about “Strawhorse” mentioned earlier.
The technique known as Diverse Double-Compilation, formally defined and researched by David A. Wheeler, can answer this question. To sum up quickly how it works: given two compilers, one trusted and one under test, the compiler under test is built twice, once with each compiler. Using the compilers created from this build, the compiler under test is built again. If the output is the same, then we have a proof that no backdoors have been inserted during the compilation. For this scheme to work, the output of the final compilations need to be the same. And that’s exactly where reproducible builds are useful.
Dependency Tree Awareness and Software Bill of Materials (SBOM)
Reproducible builds significantly enhance the transparency and integrity of software development by making sure developers have complete awareness of the dependency tree, ensuring that all dependencies are exactly as intended, without any undisclosed changes or additions. This level of control is crucial for security, as it helps in identifying and mitigating vulnerabilities that often reside in overlooked components.
Therefore, this facilitates the creation of Software Bill of Materials (SBOM), a comprehensive inventory of all components, libraries, and modules included in a piece of software. This is increasingly important in today’s software development landscape, where open-source components and third-party libraries are ubiquitous. SBOMs are indispensable for regulatory compliance, vulnerability management, and risk assessment, making reproducible builds not just beneficial but essential for maintaining software integrity, reliability, and security in the face of evolving cyber threats and regulatory requirements.
By enhancing the ability to generate accurate SBOMs and ensuring a deep understanding of software dependencies, reproducible builds support security and compliance objectives effectively, aligning with broader industry trends towards more secure and accountable software development practices.
Ephemeral Development Environments
Ephemeral development environments, empowered by reproducible builds, represent a significant shift in software development. This approach facilitates the on-demand creation of identical development environments and dependencies, moving away from traditional static setups to dynamic, task-specific configurations that are disposed of after use. This shift not only minimizes setup times and configuration conflicts but also enhances security by reducing exposure to vulnerable components.
The cornerstone of this transformative approach is the consistent application of reproducible builds, which ensures that environments are precisely provisioned whenever needed. This consistency is critical for integrating with modern practices such as containerization and Infrastructure as Code (IaC), streamlining the onboarding process for new team members by automating setup and reducing the likelihood of errors.
These practices align with the principles of DevSecOps, supporting a more agile development cycle through automated testing and CI/CD pipelines, while also improving security by isolating tasks and providing dependencies temporarily. The adoption of ephemeral environments and reproducible builds signifies a move towards more flexible, efficient, and secure software development, facilitating rapid onboarding and promoting continuous innovation within teams.
Other resources
The following articles contain some more arguments:
- Cyberwar and Global Compromise by Mike Perry from the Tor Project, 2013-08-20
- Software Transparency: Part 1 by yan, 2014-07-11.
Introduction
Achieve deterministic builds
- Commandments of reproducible builds
- Variations in the build environment
- SOURCE_DATE_EPOCH
- Deterministic build systems
- Volatile inputs can disappear
- Stable order for inputs
- Stripping of unreproducible information
- Value initialization
- Version information
- Timestamps
- Timezones
- Locales
- Archive metadata
- Stable order for outputs
- Randomness
- Build path
- System images
- JVM
Define a build environment
- What's in a build environment?
- Recording the build environment
- Definition strategies
- Proprietary operating systems