Scalable Build Systems: An Analysis of Tup

by Chad Austin on June 10th, 2010

I previously argued that any tool whose running time is proportional with the number of files in a project scales quadratically with time. Bluem00 on Hacker News pointed me towards Tup, a scalable build system with goals similar to ibb.

Mike Shal, Tup’s author, wrote Build System Rules and Algorithms, formalizing the algorithmic deficiencies with existing build systems and describing Tup’s implementation, a significant improvement over the status quo. I would like to document my analysis of Tup and whether I think it replaces ibb.

Before we get started, I’d like to thank Mike Shal for being receptive to my comments. I sent him a draft of my analysis and his responses were thoughtful and complete. With his permission, I have folded his thoughts into the discussion below.

Is Tup suitable as a general-purpose build system? Will it replace SCons or Jam or Make anytime soon? Should I continue working on ibb?

Remember our criteria for a scalable build system, one that enables test-driven development at arbitrary project sizes:

  1. O(1) no-op builds
  2. O(changes) incremental builds
  3. Accessible dependency DAG atop which a variety of tools can be built

Without further ado, my thoughts on Tup follow:

Syntax

Tup defines its own declarative syntax, similar to Make or Jam. At first glance, the Tup syntax looks semantically equivalent to Make. From the examples:

: hello.c |> gcc hello.c -o hello |> hello

Read the dependency graph from left to right: hello.c is compiled by gcc into a hello executable. Tup supports variable substitution and limited flow control.

Build systems are inherently declarative, but I think Tup’s syntax has two flaws:

  1. Inventing a new syntax unnecessarily slows adoption: by implementing GNU Make’s syntax, Tup would be a huge drop-in improvement to existing build systems.
  2. Even though specifying dependency graphs is naturally declarative, I think a declarative syntax is a mistake. Build systems are a first-class component of your software and your team’s workflow. You should be able to develop them in a well-known, high-level language such as Python or Ruby, especially since those languages come with rich libraries. As an example, SCons gets this right: it’s trivial for me to write CPU autodetection logic for parallel builds in a build script if that makes sense. Or I can extend SCons’s Node system to download source files from the web.

Implementation Language

Tup is 15,000 lines of C. There’s no inherent problem with C, but I do think a community-supported project is more likely to thrive in a faster and safer language, such as Python or Ruby. Having worked with teams of engineers, it’s clear that most engineers can safely work in Python with hardly any spin-up time. I can’t say the same of C.

Git is an interesting case study: The core performance-sensitive data structures and algorithms are written in C, but many of its interesting features are written in Perl or sh, including git-stash, git-svn, and git-bisect. Unlike Git, I claim Python and Ruby are plenty efficient for the entirety of a scalable build system. Worst case, the dependency graph could live in C and everything else could stay in Python.

Scanning Implicit Dependencies

The Tup paper mentions offhand that it’s trivial to monitor a compiler’s file accesses and thus determine its true dependencies for generating a particular set of outputs. The existing implementation uses a LD_PRELOAD shim to monitor all file accesses attempted by, say, gcc, and treats those as canonical input files. Clever!

This is a great example of lateral, scrappy thinking. It has a couple huge advantages:

  1. No implicit dependencies (such as C++ header file includes) need be specified — if all dependencies come from the command line or a file, Tup will know them all.
  2. It’s easy to implement. Tup’s ldpreload.c is a mere 500 lines.

And a few disadvantages:

  1. Any realistic build system must treat Windows as a first-class citizen. Perhaps, on Windows, Tup could use something like Detours. I’ll have to investigate that.
  2. Intercepting system calls is reliable when the set of system calls is known and finite. However, there’s nothing stopping the OS vendor from adding new system calls that modify files.
  3. Incremental linking / external PDB files: these Visual C++ features both read and write the same file in one compile command. SCons calls this a SideEffect: commands that share a SideEffect cannot parallelize. A build system that does not support incremental linking or external symbols would face resistance among Visual C++ users.

And some open questions:

  1. I haven’t completely thought this through, but it may be important to support user-defined dependency scanners that run before command execution, enabling tools such as graph debugging.
  2. I don’t have a realistic example, but imagine a compiler that reads spurious dependency changes from run to run; say, a compiler that only checks its license file on every other day.

Stepping back, I think the core build system should not be responsible for dependency scanning. By focusing on dependency graph semantics and leaving dependency scanning up to individual tools (which may or may not use LD_PRELOAD or similar techniques), a build system can generalize to uses beyond compiling software, as I mentioned in my previous blog post.

Dependency Graph

Tup’s dependency DAG contains two types of nodes: Commands and Files. Files depend on Commands and Commands depend on other Files. I prefer Tup’s design over SCons’s DAG-edges-are-commands design for two reasons:

  1. It simplifies the representation of multiple-input multiple-output commands.
  2. Some commands, such as “run-test foo” or “search-regex some.*regex” depend on source files but produce no files as output. Since they fit naturally into the build DAG, commands are a first-class concept.

Build Reliability

Tup, like SCons, places a huge emphasis on build reliability. This is key and I couldn’t agree more. In the half-decade I’ve used SCons, I can count the number of broken builds on one hand. Sadly, many software developers are used to typing “make clean” or clicking “full rebuild” when something is weird. What a huge source of waste! Developers should trust the build system as much as their compiler, and the build system should go out of its way to help engineers specify complete and accurate dependencies.

Reliable builds imply:

  1. Changes are tracked by file contents, not timestamps.
  2. The dependency graph, including implicit dependencies such as header files and build commands, is complete and accurate by default.
  3. Compiler command lines are included in the DAG. Put another way: if the command used to build a file changes, the file must be rebuilt.

Tup takes a strict functional approach and formalizes build state as a set of files and their contents. (I would argue build state also includes file metadata such as file names and timestamps, at least if the compiler uses such information.) If the build state does not change between invocations, then no work must be done.

Tup even takes build reliability one step further than SCons: If you rename a target file in the build script, Tup actually deletes the old built target before rebuilding the new one. Thus, you will never have stale target executables lying around in your build tree.

Nonetheless, there are situations where a project may choose to sacrifice absolute reliability for significant improvements in build speed, such as incremental linking discussed above.

Core vs. Community

A build system is a critical component of any software team’s development process. Since every team is different, it’s essential that a build system is flexible and extensible. SCons, for example, correctly chose to implement build scripts in a high-level language (Python) with a declarative API for specifying nodes and edges in the dependency graph.

However, I think SCons did not succeed at separating its core engine from its community. SCons tightly couples the underlying dependency graph with support for tools like Visual C++, gcc, and version control. The frozen and documented SCons API is fairly high-level while the (interesting) internals are treated as private APIs. It should be the opposite: a dependency graph is a narrow, stable, and general API. By simplifying and documenting the DAG API, SCons could enable broader uses, such as unit test execution.

Configuration

Like Tup’s author, I agree that build autoconfiguration (such as autoconf or SCons’s Configure support) should not live in the core build system. Autoconfiguration is simply an argument that build scripts should be specified in a general programming language and that the community should develop competing autoconfiguration systems. If a particular autoconfiguration system succeeds in the marketplace, then, by all means, ship it with your build tool. Either way, it shouldn’t have access to any internal APIs. Configuration mechanisms are highly environment-sensitive and are best maintained by the community anyway.

DAG post-process optimizations

Another argument for defining a build tool in a general-purpose language is to allow user-defined DAG optimizations and sort orders. I can think of two such use cases:

  1. Visual C++ greatly improves compile times when multiple C++ files are specified on one command line. In fact, the benefit of batched builds can exceed the benefit of PCH. A DAG optimizer would search for a set of C++ source files that produce object files in the same directory and rewrite the individual command lines into one.
  2. When rapidly iterating, it would be valuable for a build system or test runner to sort such that the most-recently-failed compile or test runs first. However, when hunting test interdependencies as part of a nightly build, you may want to shuffle test runs. On machines with many cores but slow disks, you want to schedule expensive links as soon as possible to mitigate the risk that multiple will execute concurrently and thrash against your disk.

Conclusion

Tup is a significant improvement over the status quo, and I have personally confirmed its performance — it’s lightning fast and it scales to arbitrary project sizes.

However, without out-of-the-box Windows support, a mainstream general-purpose language, and a model for community contribution, I don’t see Tup rapidly gaining traction. With the changes I suggest, it could certainly replace Make and perhaps change the way we iterate on software entirely.

Next, I intend to analyze prebake.

3 Responses to “Scalable Build Systems: An Analysis of Tup”

  1. Martin Says:

    Nice post! So are you going to continue working on ibb?

  2. Chad Austin Says:

    I expect to, though at a slow pace. Do you have ideas on how to increase its development progress without requiring more of my time?

  3. Martin Says:

    Sadly, not very concrete ones. Delegate tasks to willing and capable people? Not guaranteed to reduce work…

Leave a Reply