Scalable Build Systems: An Analysis of Tup

I previously argued that any tool whose running time is proportional with the number of files in a project scales quadratically with time. Bluem00 on Hacker News pointed me towards Tup, a scalable build system with goals similar to ibb.

Mike Shal, Tup’s author, wrote Build System Rules and Algorithms, formalizing the algorithmic deficiencies with existing build systems and describing Tup’s implementation, a significant improvement over the status quo. I would like to document my analysis of Tup and whether I think it replaces ibb.

Before we get started, I’d like to thank Mike Shal for being receptive to my comments. I sent him a draft of my analysis and his responses were thoughtful and complete. With his permission, I have folded his thoughts into the discussion below.

Is Tup suitable as a general-purpose build system? Will it replace SCons or Jam or Make anytime soon? Should I continue working on ibb?

Remember our criteria for a scalable build system, one that enables test-driven development at arbitrary project sizes:

  1. O(1) no-op builds
  2. O(changes) incremental builds
  3. Accessible dependency DAG atop which a variety of tools can be built

Without further ado, my thoughts on Tup follow:


Tup defines its own declarative syntax, similar to Make or Jam. At first glance, the Tup syntax looks semantically equivalent to Make. From the examples:

: hello.c |> gcc hello.c -o hello |> hello

Read the dependency graph from left to right: hello.c is compiled by gcc into a hello executable. Tup supports variable substitution and limited flow control.

Build systems are inherently declarative, but I think Tup’s syntax has two flaws:

  1. Inventing a new syntax unnecessarily slows adoption: by implementing GNU Make’s syntax, Tup would be a huge drop-in improvement to existing build systems.
  2. Even though specifying dependency graphs is naturally declarative, I think a declarative syntax is a mistake. Build systems are a first-class component of your software and your team’s workflow. You should be able to develop them in a well-known, high-level language such as Python or Ruby, especially since those languages come with rich libraries. As an example, SCons gets this right: it’s trivial for me to write CPU autodetection logic for parallel builds in a build script if that makes sense. Or I can extend SCons’s Node system to download source files from the web.

Implementation Language

Tup is 15,000 lines of C. There’s no inherent problem with C, but I do think a community-supported project is more likely to thrive in a faster and safer language, such as Python or Ruby. Having worked with teams of engineers, it’s clear that most engineers can safely work in Python with hardly any spin-up time. I can’t say the same of C.

Git is an interesting case study: The core performance-sensitive data structures and algorithms are written in C, but many of its interesting features are written in Perl or sh, including git-stash, git-svn, and git-bisect. Unlike Git, I claim Python and Ruby are plenty efficient for the entirety of a scalable build system. Worst case, the dependency graph could live in C and everything else could stay in Python.

Scanning Implicit Dependencies

The Tup paper mentions offhand that it’s trivial to monitor a compiler’s file accesses and thus determine its true dependencies for generating a particular set of outputs. The existing implementation uses a LD_PRELOAD shim to monitor all file accesses attempted by, say, gcc, and treats those as canonical input files. Clever!

This is a great example of lateral, scrappy thinking. It has a couple huge advantages:

  1. No implicit dependencies (such as C++ header file includes) need be specified — if all dependencies come from the command line or a file, Tup will know them all.
  2. It’s easy to implement. Tup’s ldpreload.c is a mere 500 lines.

And a few disadvantages:

  1. Any realistic build system must treat Windows as a first-class citizen. Perhaps, on Windows, Tup could use something like Detours. I’ll have to investigate that.
  2. Intercepting system calls is reliable when the set of system calls is known and finite. However, there’s nothing stopping the OS vendor from adding new system calls that modify files.
  3. Incremental linking / external PDB files: these Visual C++ features both read and write the same file in one compile command. SCons calls this a SideEffect: commands that share a SideEffect cannot parallelize. A build system that does not support incremental linking or external symbols would face resistance among Visual C++ users.

And some open questions:

  1. I haven’t completely thought this through, but it may be important to support user-defined dependency scanners that run before command execution, enabling tools such as graph debugging.
  2. I don’t have a realistic example, but imagine a compiler that reads spurious dependency changes from run to run; say, a compiler that only checks its license file on every other day.

Stepping back, I think the core build system should not be responsible for dependency scanning. By focusing on dependency graph semantics and leaving dependency scanning up to individual tools (which may or may not use LD_PRELOAD or similar techniques), a build system can generalize to uses beyond compiling software, as I mentioned in my previous blog post.

Dependency Graph

Tup’s dependency DAG contains two types of nodes: Commands and Files. Files depend on Commands and Commands depend on other Files. I prefer Tup’s design over SCons’s DAG-edges-are-commands design for two reasons:

  1. It simplifies the representation of multiple-input multiple-output commands.
  2. Some commands, such as “run-test foo” or “search-regex some.*regex” depend on source files but produce no files as output. Since they fit naturally into the build DAG, commands are a first-class concept.

Build Reliability

Tup, like SCons, places a huge emphasis on build reliability. This is key and I couldn’t agree more. In the half-decade I’ve used SCons, I can count the number of broken builds on one hand. Sadly, many software developers are used to typing “make clean” or clicking “full rebuild” when something is weird. What a huge source of waste! Developers should trust the build system as much as their compiler, and the build system should go out of its way to help engineers specify complete and accurate dependencies.

Reliable builds imply:

  1. Changes are tracked by file contents, not timestamps.
  2. The dependency graph, including implicit dependencies such as header files and build commands, is complete and accurate by default.
  3. Compiler command lines are included in the DAG. Put another way: if the command used to build a file changes, the file must be rebuilt.

Tup takes a strict functional approach and formalizes build state as a set of files and their contents. (I would argue build state also includes file metadata such as file names and timestamps, at least if the compiler uses such information.) If the build state does not change between invocations, then no work must be done.

Tup even takes build reliability one step further than SCons: If you rename a target file in the build script, Tup actually deletes the old built target before rebuilding the new one. Thus, you will never have stale target executables lying around in your build tree.

Nonetheless, there are situations where a project may choose to sacrifice absolute reliability for significant improvements in build speed, such as incremental linking discussed above.

Core vs. Community

A build system is a critical component of any software team’s development process. Since every team is different, it’s essential that a build system is flexible and extensible. SCons, for example, correctly chose to implement build scripts in a high-level language (Python) with a declarative API for specifying nodes and edges in the dependency graph.

However, I think SCons did not succeed at separating its core engine from its community. SCons tightly couples the underlying dependency graph with support for tools like Visual C++, gcc, and version control. The frozen and documented SCons API is fairly high-level while the (interesting) internals are treated as private APIs. It should be the opposite: a dependency graph is a narrow, stable, and general API. By simplifying and documenting the DAG API, SCons could enable broader uses, such as unit test execution.


Like Tup’s author, I agree that build autoconfiguration (such as autoconf or SCons’s Configure support) should not live in the core build system. Autoconfiguration is simply an argument that build scripts should be specified in a general programming language and that the community should develop competing autoconfiguration systems. If a particular autoconfiguration system succeeds in the marketplace, then, by all means, ship it with your build tool. Either way, it shouldn’t have access to any internal APIs. Configuration mechanisms are highly environment-sensitive and are best maintained by the community anyway.

DAG post-process optimizations

Another argument for defining a build tool in a general-purpose language is to allow user-defined DAG optimizations and sort orders. I can think of two such use cases:

  1. Visual C++ greatly improves compile times when multiple C++ files are specified on one command line. In fact, the benefit of batched builds can exceed the benefit of PCH. A DAG optimizer would search for a set of C++ source files that produce object files in the same directory and rewrite the individual command lines into one.
  2. When rapidly iterating, it would be valuable for a build system or test runner to sort such that the most-recently-failed compile or test runs first. However, when hunting test interdependencies as part of a nightly build, you may want to shuffle test runs. On machines with many cores but slow disks, you want to schedule expensive links as soon as possible to mitigate the risk that multiple will execute concurrently and thrash against your disk.


Tup is a significant improvement over the status quo, and I have personally confirmed its performance — it’s lightning fast and it scales to arbitrary project sizes.

However, without out-of-the-box Windows support, a mainstream general-purpose language, and a model for community contribution, I don’t see Tup rapidly gaining traction. With the changes I suggest, it could certainly replace Make and perhaps change the way we iterate on software entirely.

Next, I intend to analyze prebake.

Your Version Control and Build Systems Don’t Scale

Let’s spend a minute and talk about the performance of a tool you use every day: your build system. For this purpose, they’re all the same, so let’s assume you use GNU Make. For a no-op build, what is Make’s running time? You know, that O(N) stuff.

If you said O(Files), you’re right. Every time you type make and press enter, it scans every file in the project, looking for out-of-date sources. That doesn’t sound so bad. Linear algorithms are good, right?

Oh, but projects grow over time, typically in proportion with the number of engineers (even if some are net negative). Since projects grow:

O(Files) = O(Engineers * Time)

Assuming you’re successful, your team is probably growing too:

O(Engineers) = O(Time)

Applying some substitutions:

O(Make) = O(Files) = O(Engineers * Time) = O(Time^2)

A no-op make is quadratic in the duration of the project?!

Exhibit A, Mozilla:

Chad@laptop /c/xulrunner/mozilla-central/obj-xulrunner
$ time make
rm -f -rf ./dist/sdk
[tons of output snipped]
make[1]: Leaving directory `/c/xulrunner/mozilla-central/obj-xulrunner'

real    21m31.526s
user    4m13.292s
sys     9m15.066s

21.5 MINUTES! This is a no-op build! I changed nothing from the last build!

Mozilla’s build is an especially egregious example, but it’s not uncommon for nontrivial Make, SCons, or Jam no-op builds to exceed 15 seconds. Every time you run SCons, for example, it:

  • loads Python,
  • imports a bunch of Python code,
  • creates an entire dependency graph in memory,
  • scans all files for changes,
  • determines what to build,
  • builds it,
  • and throws all of those intermediate results away.

Am I the only person who thinks this situation is insane? We’re spending all of our time improving constant factors rather than addressing fundamentally quadratic build algorithms.

No-op builds should be O(1) and instantaneous, and most other builds should be O(WhateverChanged). What if I told you this is possible?

Introducing IBB: I/O-Bound Build

Dusty, one of the best engineers I know, once told me that C compilers are now fast enough that they’re I/O-bound, not CPU-bound as you’d expect. This was inspiration for a build system dependency engine architecture:

  • A long-running server process that contains a complete dependency graph between files, including the commands required to update targets from sources.
  • An ultra-tiny C program that communicates build requests to the build server.
  • A background thread that watches for filesystem updates (via ReadDirectoryChangesW or libfam) and updates the dependency graph in memory.

I have built a prototype of ibb’s architecture, and its code is available on my github.

In ibb, a no-op build takes constant time, no matter the size of the project. I tested with Noel Llopis’s build system stress test and no-op builds are merely 50 milliseconds. Take that, everyone else!

A Faster Grep

Let’s say you wanted to run a regular expression across your entire codebase. On Windows, you’ll spend all of your time in kernel calls, even if the files are in cache. However, remember that modern memory bandwidths are measured in GB/s…

ibb’s file search implementation can run a regular expression across 200 MB of PHP and JavaScript in 100 milliseconds. It copies file data into memory on the first run and rapidly scans across that memory on future runs. It could just as easily mmap and use OS’s disk cache to avoid kernel calls.

Minimal Test Runner

Carl Masak’s inspirational addictive TDD harness is also possible with ibb. If you can determine which unit tests depend on which source files (via an __import__ hook in Python, say), you can run appropriate tests immediately after saving a source file.

Version Control

I’ve watched both git and Subversion fall apart as a codebase grows. git is so faaast… until you import a 20 GB repository. Then it’s O(Files) just like everything else.

$ time git status
# On branch master
nothing to commit (working directory clean)

real    0m9.603s
user    0m0.858s
sys     0m7.378s

10 seconds to see what files I’ve modified. Ouch.

Imagine if ibb could keep track of which files have changed…

What This Means For You

Running tests, grepping code, and building binaries from source are equivalent problems. ibb’s architecture supports lightning-fast implementations of all of the above.

By converting an O(Time^2) build, test runner, or version control system to O(1), we eliminate hours of costly engineer time and encourage good habits like committing often and frequently running tests, no matter the age of the project. Frankly, instant feedback feels great.

It’s crazy to me that this architecture isn’t standard practice across development tools.

Take ibb’s concepts or code, integrate it with your engineering processes, and improve the quality of your team’s life today.

Fast Builds: Incremental Linking and Embedded SxS Manifests

As I’ve said before, fast builds are crucial for efficient development. But for those of us who use C++ regularly, link times are killer. It’s not uncommon to spend minutes linking your compiled objects into a single binary. Incremental linking helps a great deal, but, as you’ll see, incremental linking has become a lot harder in the last few versions of Visual Studio…

Linking an EXE or DLL is a very expensive operation — it’s roughly O(N) where N is the amount of code being linked. Worse, several optimizing linkers defer code generation to link time, exacerbating the problem! When you’re trying to practice TDD, even a couple seconds in your red-green-refactor iteration loop is brutal. And it’s not uncommon for large projects to spend minutes linking.

Luckily, Visual C++ supports an /INCREMENTAL flag, instructing relinks to modify the DLL or EXE in-place, reducing link time to O(changed code) rather than O(all code). In the olden days of Visual C++ 6, all you had to do was enable /INCREMENTAL, and bam, fast builds.

These days, it’s not so simple. Let’s take an excursion into how modern Windows finds DLL dependencies…

Side-by-Side (SxS) Manifests

Let’s say you’re writing a DLL foo.dll that depends on the CRT by using, say, printf or std::string. When you link foo.dll, the linker will also produce foo.dll.manifest. Windows XP and Vista use .manifest files to load the correct CRT version. (This prevents DLL hell: two programs can depend on different versions of the same DLL.)

Since remembering to carry around .manifest files is annoying and error-prone, Microsoft and others recommend that you embed them into your EXE or DLL as a resource:

mt.exe –manifest foo.dll.manifest -outputresource:foo.dll;2

Convenient, but it modifies the DLL in place, breaking incremental links! This is a known problem, and the “solutions” others suggest are INSANE. My favorite is the 300-line makefile with a note from the author “[If this does not work], please let me know ASAP. I will try fixing it for you.” Why doesn’t Visual Studio just provide an /EMBEDMANIFESTRESOURCE flag that would automatically solve the problem?!

I just want incremental linking and embedded manifests. Is that so much to ask? I tried a bunch of approaches. Most didn’t work. I’ll show them, and then give my current (working) approach. If you don’t care about the sordid journey, skip to the end.

What Didn’t Work

  • Not embedding manifests at all.

What went wrong: I could never figure out the rules where by manifest dependencies are discovered. If python.exe depends on the release CRT and your module DLL depends on the debug CRT, and they live in different directories (??), loading the module DLL would fail. Gave up.

  • Linking a temporary file (foo.pre.dll), making a copy (foo.pre.dll -> foo.dll), and embedding foo.pre.dll.manifest into foo.dll with mt.exe.

What went wrong: As far as I can tell, mt.exe is a terrible piece of code. In procmon I’ve watched it close file handles it didn’t open, causing permissions violations down the line. (?!) Sometimes it silently corrupts your EXEs and DLLs too. This may be a known weakness in UpdateResource. Yay! (Thanks to Kevin Gadd; he was instrumental in diagnosing these bugs.) mt.exe may or may not be fixed in recent Visual Studios. Either way, I’m convinced mt.exe has caused us several intermittent build failures in the past. Avoiding it is a good thing.

  • Linking to a temporary file (foo.pre.dll), generating a resource script (foo.pre.rc) from (foo.pre.dll.manifest), compiling said resource script (foo.pre.res), and including the compiled resource into the final link (foo.dll).

What went wrong: This approach is reliable but slow. Linking each DLL and EXE twice, even if both links are incremental, is often slower than just doing a full link to begin with.

  • Linking foo.dll with foo.dll.manifest (via a resource script, as above) if it exists. If foo.dll.manifest changed as a result of the link, relink.

I didn’t actually try this one because non-DAG builds scare me. I like the simplicity and reliability of the “inputs -> command -> outputs” build model. It’s weird if foo.dll.manifest is an input and an output of the link. Yes, technically, that’s how incremental linking works at all, but the non-DAG machinery is hidden in link.exe. From SCons’s perspective, it’s still a DAG.

Finally, a working solution:

For every build configuration {debug,release} and dependency {CRT,MFC,…}, link a tiny program to generate said dependency manifest. Compile manifest into a resource script (.rc -> .res) and link the compiled manifest resources into your other DLLs and EXEs.

This approach has several advantages:

  • These pre-generated manifest resources are created once and reused in future builds, with no impact to build time.
  • The build is a DAG.
  • We avoid letting mt.exe wreak havoc on our build by sidestepping it entirely.

I can think of one disadvantage – you need to know up-front on which SxS DLLs you depend. For most programs, the CRT is the only one. And hopefully understanding your dependencies isn’t a bad thing, though. ;)

After several evenings of investigation, we’re back to the same link times we had with Visual C++ 6! Yay!

The Code

If you care, here’s our SCons implementation of embedded manifests:

# manifest_resource(env, is_dll) returns a manifest resource suitable for inclusion into
# the sources list of a Program or SharedLibrary.
manifest_resources = {}
def manifest_resource(env, is_dll):
    if is_dll:
        resource_type = 2 #define ISOLATIONAWARE_MANIFEST_RESOURCE_ID 2
        resource_type = 1 #define CREATEPROCESS_MANIFEST_RESOURCE_ID  1

    is_debug = env['DEBUG'] # could use a 'build_config' key if we had more than debug/release
    del env

    def build_manifest_resource():
        if is_debug:
            env = baseEnv.Clone(tools=[Debug])
            env = baseEnv.Clone(tools=[Release])

        if is_dll:
            linker = env.SharedLibrary
            target_name = 'crt_manifest.dll'
            source = env.File('#/MSVC/crt_manifest_dll.cpp')
            linker = env.Program
            target_name = 'crt_manifest.exe'
            source = env.File('#/MSVC/crt_manifest_exe.cpp')

        env['OUTPUT_PATH'] = '#/${BUILDDIR}/${IMVU_BUILDDIR_NAME}/%s' % (target_name,)

        obj = env.SharedObject('${OUTPUT_PATH}.obj', source)
        result = linker([env.File('${OUTPUT_PATH}'), '${OUTPUT_PATH}.manifest'], obj)
        manifest = result[1]

        def genrc(env, target, source):
            [target] = target
            [source] = source
            # 24 = RT_MANIFEST
            file(target.abspath, 'w').write('%d 24 "%s"' % (resource_type, source.abspath,))

        rc = env.Command('${OUTPUT_PATH}.rc', manifest, genrc)
        res = env.RES('${OUTPUT_PATH}.res', rc)
        env.Depends(res, manifest)
        return res
    key = (is_debug, resource_type)
        return manifest_resources[key]
    except KeyError:
        res = build_manifest_resource()

        manifest_resources[key] = res
        return res

Fast Builds: Unintrusive Precompiled Headers (PCH)

Fast builds are critical to the C++ programmer’s productivity and happiness. One common technique for reducing build times is precompiled headers (PCH). There’s plenty of literature out there; I won’t describe PCH in detail here.

But one thing that’s always bothered me about PCH is that it affects your code. #pragma hdrstop and #include "StdAfx.h" everywhere. Gross.

I’m a strong believer in clean code without boilerplate, so can’t we do better? Ideally we could make a simple tweak to the build system and see build times magically improve. Enno enticed me with mentions of his fast builds, so I took a look…

Using PCH in Visual C++ requires a header (call it Precompiled.h) that includes all of the expensive dependencies:

#include <vector>
#include <map>
#include <iostream>
#include <fstream>
#include <boost/python.hpp>
#include <windows.h>
#include <mmsystem.h>

Additionally, we need a source file (let’s get creative and call it Precompiled.cpp), which is empty except for #include "Precompiled.h".

Compile Precompiled.cpp with /Yc Precompiled.h to generate Precompiled.pch, the actual precompiled header. Then, use the precompiled header on the rest of your files with /Yu Precompiled.h.

OK, here’s the step that prevented me from using PCH for so long: every single source file in your project must #include "Precompiled.h" on its first line.

That’s ridiculous! I don’t want to touch every file!

It turns out our savior is the /FI option. From the documentation:

This option has the same effect as specifying the file with double quotation marks in an #include directive on the first line of every source file specified on the command line […]

Exactly what we want!

But wait, doesn’t that mean every .cpp in our project will have access to every symbol included by the PCH? Yes. :( It’s worth the build speedup.

However, explicit physical dependencies are important, and the only way to prevent important things from breaking is by blocking commits if they fail. Since enabling and disabling PCH does not require any code changes, it’s easy enough to add a “disable PCH” option to your build system and run it on your continuous integration server:

Compile without PCH

If somebody uses std::string but forgets to #include <string>, the build will fail and block commits.

In the end, here’s the bit of SCons magic that lets me quickly drop PCH into a project:

def enable_pch(env, source_file, header):
        PCH, PCH_OBJ = env.PCH(source_file)
        env['PCH'] = PCH
        env['PCHSTOP'] = header
        env.Append(CPPFLAGS=['/FI' + header])
        return [PCH_OBJ]
        return [source_file]

Now you can benefit from fast builds with minimal effort and no change to your existing code!