Your Version Control and Build Systems Don’t Scale

Let’s spend a minute and talk about the performance of a tool you use every day: your build system. For this purpose, they’re all the same, so let’s assume you use GNU Make. For a no-op build, what is Make’s running time? You know, that O(N) stuff.

If you said O(Files), you’re right. Every time you type make and press enter, it scans every file in the project, looking for out-of-date sources. That doesn’t sound so bad. Linear algorithms are good, right?

Oh, but projects grow over time, typically in proportion with the number of engineers (even if some are net negative). Since projects grow:

O(Files) = O(Engineers * Time)

Assuming you’re successful, your team is probably growing too:

O(Engineers) = O(Time)

Applying some substitutions:

O(Make) = O(Files) = O(Engineers * Time) = O(Time^2)

A no-op make is quadratic in the duration of the project?!

Exhibit A, Mozilla:

Chad@laptop /c/xulrunner/mozilla-central/obj-xulrunner
$ time make
rm -f -rf ./dist/sdk
[tons of output snipped]
make[1]: Leaving directory `/c/xulrunner/mozilla-central/obj-xulrunner'

real    21m31.526s
user    4m13.292s
sys     9m15.066s

21.5 MINUTES! This is a no-op build! I changed nothing from the last build!

Mozilla’s build is an especially egregious example, but it’s not uncommon for nontrivial Make, SCons, or Jam no-op builds to exceed 15 seconds. Every time you run SCons, for example, it:

  • loads Python,
  • imports a bunch of Python code,
  • creates an entire dependency graph in memory,
  • scans all files for changes,
  • determines what to build,
  • builds it,
  • and throws all of those intermediate results away.

Am I the only person who thinks this situation is insane? We’re spending all of our time improving constant factors rather than addressing fundamentally quadratic build algorithms.

No-op builds should be O(1) and instantaneous, and most other builds should be O(WhateverChanged). What if I told you this is possible?

Introducing IBB: I/O-Bound Build

Dusty, one of the best engineers I know, once told me that C compilers are now fast enough that they’re I/O-bound, not CPU-bound as you’d expect. This was inspiration for a build system dependency engine architecture:

  • A long-running server process that contains a complete dependency graph between files, including the commands required to update targets from sources.
  • An ultra-tiny C program that communicates build requests to the build server.
  • A background thread that watches for filesystem updates (via ReadDirectoryChangesW or libfam) and updates the dependency graph in memory.

I have built a prototype of ibb’s architecture, and its code is available on my github.

In ibb, a no-op build takes constant time, no matter the size of the project. I tested with Noel Llopis’s build system stress test and no-op builds are merely 50 milliseconds. Take that, everyone else!

A Faster Grep

Let’s say you wanted to run a regular expression across your entire codebase. On Windows, you’ll spend all of your time in kernel calls, even if the files are in cache. However, remember that modern memory bandwidths are measured in GB/s…

ibb’s file search implementation can run a regular expression across 200 MB of PHP and JavaScript in 100 milliseconds. It copies file data into memory on the first run and rapidly scans across that memory on future runs. It could just as easily mmap and use OS’s disk cache to avoid kernel calls.

Minimal Test Runner

Carl Masak’s inspirational addictive TDD harness is also possible with ibb. If you can determine which unit tests depend on which source files (via an __import__ hook in Python, say), you can run appropriate tests immediately after saving a source file.

Version Control

I’ve watched both git and Subversion fall apart as a codebase grows. git is so faaast… until you import a 20 GB repository. Then it’s O(Files) just like everything else.

$ time git status
# On branch master
nothing to commit (working directory clean)

real    0m9.603s
user    0m0.858s
sys     0m7.378s

10 seconds to see what files I’ve modified. Ouch.

Imagine if ibb could keep track of which files have changed…

What This Means For You

Running tests, grepping code, and building binaries from source are equivalent problems. ibb’s architecture supports lightning-fast implementations of all of the above.

By converting an O(Time^2) build, test runner, or version control system to O(1), we eliminate hours of costly engineer time and encourage good habits like committing often and frequently running tests, no matter the age of the project. Frankly, instant feedback feels great.

It’s crazy to me that this architecture isn’t standard practice across development tools.

Take ibb’s concepts or code, integrate it with your engineering processes, and improve the quality of your team’s life today.