Code Reviews: Follow the Data

This is a mirror of the corresponding post at the IMVU Engineering Blog.

After years of reviewing other people’s code, I’d like to share a couple practices that improve the effectiveness of code reviews.

Why Review Code?

First, why review code at all? There are a few reasons:

  • Catching bugs
  • Enforce stylistic consistency with the rest of the code
  • Finding opportunities to share code across systems
  • Helping new engineers spin up with the team and project
  • Helping API authors see actual problems that API consumers face
  • Maintain the health of the system overall

It seems most people reach for one of two standard techniques when reviewing code; they either review diffs as they arrive or they review every change in one large chunk. We’ve used both techniques, but I find there’s a more effective way to spend code review time.

Reviewing Diffs

It seems the default code review technique for most people is to sign up for commit emails or proposed patches and read every diff as it goes by. This has some benefits – it’s nice to see when someone is working in an area or that a library had a deficiency needing correction. However, on a large application, diffs become blinding. When all you see is streams of diffs, you can lose sight of how the application’s architecture is evolving.

I’ve ended up in situations where every diff looked perfectly reasonable but, when examining the application at a higher level, key system invariants had been broken.

In addition, diffs tends to emphasize small, less important details over more important integration and design risks. I’ve noticed that, when people only review diffs, they tend to point out things like whitespace style, how iteration over arrays is expressed, and the names of local variables. In some codebases these are important, but higher-level structure is often much more important over the life of the code.

Reviewing diffs can also result in wasted work. Perhaps someone is iterating towards a solution. The code reviewer may waste time reviewing code that its author is intending to rework anyway.

Reviewing Everything

Less often, I’ve seen another code review approach similar to reviewing diffs, but on entire bodies of work at a time. This approach can work, but it’s often mindnumbing. See, there are two types of code: core, foundational code and leaf code. Foundational code sits beneath and between everything else in the application. It’s important that it be correct, extensible, and maintainable. Leaf code is the specific functionality a feature needs. It is likely to be used in a single place and may never be touched again. Leaf code is not that interesting, so most of your code review energy should go towards the core pieces. Reviewing all the code in a project or user story mixes the leaf code in with the foundational code and makes it harder see exactly what’s going on.

I think there’s a better way to run code reviews. It’s not as boring, tends to catch important changes to core systems, and is fairly efficient in terms of time spent.

Follow the Data

My preferred technique for reviewing code is to trace data as it flows through the system. This should be done after a meaningful, but not TOO large, body of work. You want about as much code as you can review in an hour: perhaps more than a user story, but less than an entire feature. Start with a single piece of data, say, some text entered on a website form. Then, trace that data all the way through the system to the output. This includes any network protocols, transformation functions, text encoding, decoding, storage in databases, caching, and escaping.

Following data through the code makes its high-level structure apparent. After all, code only exists to transform data. You may even notice scenarios where two transformations can be folded into one. Or perhaps eliminated entirely — sometimes abstraction adds no value at all.

This style of code review frequently covers code that wasn’t written by the person or team that initiated the code review. But that’s okay! It helps people get a bigger picture, and if the goal is to maintain overall system health, new code and existing shouldn’t be treated differently.

It’s also perfectly fine for the code review not to cover every new function. You’ll likely hit most of them while tracing the data (otherwise the functions would be dead code) but it’s better to emphasize the main flows. Once the code’s high-level structure is apparent, it’s usually clear which functions are more important than others.

After experimenting with various code review techniques, this approach has been the most effective and reliable over time. Make sure code reviews are somewhat frequent, however. After completion of every “project” or “story” or “module” or whatever, sit down for an hour with the code’s authors and appropriate tech leads and review the code. If the code review takes longer than an hour, people become too fatigued to add value.

Handling Code Review Follow-Up Tasks

At IMVU in particular, as we’re reviewing the code, someone will rapidly take notes into a shared document. The purpose of the review meeting is to review the code, not discuss the appropriate follow-up actions. It’s important not to interrupt the flow of the review meeting with a drawn-out discussion about what to do about one particular issue.

After the meeting, the team leads should categorize follow-up tasks into one of three categories:

  1. Do it right now. Usually tiny tweaks, for example: rename a function, call a different API, delete some commented-out code, move a function to a different file.
  2. Do it at the top of the next iteration. This is for small or medium-sized tasks that are worth doing. Examples: fix a bug, rework an API a bit, change an important but not-yet-ubiquitous file format.
  3. Would be nice someday. Delete these tasks. Don’t put them in a backlog. Maybe mention them to a research or infrastructure team. Example: “It would be great if our job scheduling system could specify dependencies declaratively.”

Nothing should float around on an amorphous backlog. If they are important, they’ll come up again. Plus, it’s very tempting to say “We’ll get to it” but you never will, and even if you have time, nobody will have context. So either get it done right away or be honest with yourself and consciously drop it.

Now go and review some code! :)

High Order Bits (or: mostly correct but in the right direction)

First, the moral: Unit tests are good. But reliable design is better.

Even if you have to deal with short-term pain. Even if you haven’t figured out all of the edge cases.

Let me back up. I love automated tests. I’ve been test-driving code at IMVU since I started. We buy new engineers a copy of Test-Driven Development: By Example. Whenever there is a bug, we write tests to make sure it never happens again.

After years of working this way, seeing projects succeed and fail, I’d like to refine my perspective. Let me share a story.

IMVU was originally a bolt-on addition to AOL Instant Messenger. Two IMVU clients communicated with each other by manipulating AOL IM’s UI and scanning the window for new text messages, much like a screen reader would. This architecture propagated some implications through our entire codebase:

1) The messaging layer was inherently unreliable. AOL IM chat windows could be manipulated by the user or other programs. Thus, our chat protocol was built around eventual consistency.

2) We could not depend on an authoritative source of truth. Since text-over-IM is peer-to-peer, no client has a true view into where all of the avatars are sitting or who currently owns the room.

Thus, in 2008, long after we’d dropped support for integration with third-party IM clients and replaced it with an authoritative state database, we continued to have severe state consistency bugs. The client’s architecture still pretended like the chat protocol was unreliable and state was peer-to-peer instead of authoritative.

To address these bugs, we wrote copious test coverage. These were deep tests: start up a local Apache and MySQL instance, connect a couple ClientApp Python processes to them, have one invite another to chat, and assert that their scene graphs were consistent. We have dozens of these tests, for all kinds of edge cases. And we thought we’d fixed the bugs for good…

But the bugs returned. These tests are still running and passing, mind you, but differences in timing and sequencing result in the same state consistency issues we saw in 2008. It’s clear that test coverage is not sufficient to prevent these types of bugs.

So what’s the other ingredient in reliable software? I argue that, in agile software development, correct-by-design systems are underemphasized.

Doesn’t Test Driven Development guide me to build correct-by-design systems?

TDD prescribes a “red, green, refactor” rhythm, where you write a failing test, do the simplest work to make it pass, and then refactor the code so it’s high quality. TDD helps you reach the “I haven’t seen it fail” stage, by verifying that yes, your code can pass these tests. But just because you’ve written some tests doesn’t mean your code will always work.

So there’s another level of reliability: “I have considered the ways it can fail, but I can’t think of any.” This statement is stronger, assuming you’re sufficiently imaginative. Even still, you won’t think of everything, especially if you’re working at the edge of your human capacity (as you should be).

Nonetheless, thoughtfulness is better than nothing. I recommend adding a fourth step to your TDD rhythm: “Red, green, refactor, what else could go wrong?” In that fourth step, you deeply examine the code and think of additional tests to write.

The strongest level of software correctness is not about finding possible failure conditions; it’s about proving that your system works in the presence of all inputs. Correctness proofs for non-trivial algorithms are too challenging for all of the code we write, but in a critical subsystem like chat state management, the time spent on a lightweight proof will easily pay for itself. Again, I’m not advocating that we always prove the correctness of our software, but we should at least generally be convinced of its correctness and investigate facts that indicate otherwise. TDD by itself is not enough.

OK, so we can’t easily test-drive or refactor our way out of the chat system mess we got ourselves into, because it’s simply too flawed, so what can we do? The solution is especially tricky, because in situations like this, there are always features that depend on subtleties of the poor design. A rewrite would break those features, which is unacceptable, right? Even if breaking those features is acceptable to the company, there are political challenges. Imagine the look on your product owner’s face when you announce “Hey I have a new architecture that will break your feature but provide no customer benefit yet.”

The ancient saying “You can’t make an omelette without breaking some eggs” applies directly here. Preserving 100% feature compatibility is less important than fixing deep flaws.

Why? High-order bits are hardest to change, but in the end, are all that matters. The low-order bits are easy to change, and any competent organization will fix the small things over time.

I can’t help but recall the original iPhone. Everyone said “What?! No copy and paste?!” Indeed, the iPhone couldn’t copy and paste until 18 months and two major OS releases later. Even still, the iPhone reshaped the mobile industry. Clearly 100% feature compatibility is not a requirement for success.

My attitude towards unit testing has changed. While I write, run, and love unit testing, I value correct-by-design subsystems even more. When it comes down to it, tests are low-order bits compared to code that just doesn’t break.

For those curious, what are we doing about the chat system? I’ll let Jon’s GDC presentation speak for itself.

How do you balance Immediate and Meta?

I like to split work into two types, Immediate and Meta, which I define as follows:

  • Immediate: work that moves us closer to an immediate goal. Examples: bug fixes, development of a feature in a software application
  • Meta: work that improves or changes how we perform Immediate work. Examples: planning, refactoring, employee training, development of programming languages, and communication between teams

Clearly you cannot focus entirely on one or the other. If we focused entirely on immediate work, we would all be programming in assembly language with no communication between engineers. On the other hand, if we spent all of our time planning or writing better programming languages, we’d never sell any product.

Customers directly pay for the results of Immediate work and indirectly how fast you can deliver it.

Meta is tooling and process. ROI determines your investment in Meta. At the conception of a startup, you don’t know your business model or market, so investing in tooling is probably foolhardy. You can’t afford new programming languages or side innovations, so get ‘er done.

However, successfully raising funds changes that equation. With a longer horizon, investing in your iteration loop and tools begins to make sense. It seems Meta ROI is, roughly, ExpectedImmediateTime * ImprovementFactor / CostOfImprovement.

Additionally, Meta work applies to itself. Complex systems are like a dark cave — every step down a path illuminates further steps. You can also think of this effect as “leveling up”, like in a video game. If your team begins Test-Driven Development or Continuous Deployment early, it will begin to see further improvements, compounding the benefits. Process improvement has exponential return (as any Singularitarian is quick to mention).

So how can you balance these types of work? Let’s discuss several approaches:

Start 100% Immediate and Asymptotically Approach 100% Meta

Upon learning to program, students invariably have the attitude “How do I quickly solve the problem at hand?” After witnessing the limits of that attitude as their non-trivial programs collapse under their own complexity, their attitude shifts to “How can I make programming easier?” From this you see such rarely-fruitful projects as new operating systems and programming languages.

(This is the story of my life… I haven’t finished writing a single game since I learned a language powerful enough to metaprogram.)

Always Balance 50% Immediate and 50% Meta

It seems like there should be an optimal balance of process improvement and immediate value creation. Maybe that’s true in the long run, but, in my experience, we punctuate work with process improvement. Intel has its Tick Tock approach. IMVU periodically makes comparitively large investments in build processes, separated by periods of smooth execution. Students go to school for part of the year — process improvement — and work in the summers or co-ops. (However, I’d argue that applying tick tock more directly to education would be beneficial: a year of school and a year of work. That’s another blog post.)

Nintendo’s “Spiral” Analogy

If Nintendo focused entirely on immediate results, they would end up in a death spiral: financial pressure reduces available time, limited time reduces quality, poor quality causes lower game sales, and low game sales increases financial pressure.

If you have the latitude to pay attention to process improvement, you will instead follow an “Upward Spiral”. Quality and innovation drive sales, giving you more time to spend on quality and innovation.

I Don’t Have the Answer

How do you balance Immediate and Meta? Please comment – I’d love to hear your thoughts.

Scalable Build Systems: An Analysis of Tup

I previously argued that any tool whose running time is proportional with the number of files in a project scales quadratically with time. Bluem00 on Hacker News pointed me towards Tup, a scalable build system with goals similar to ibb.

Mike Shal, Tup’s author, wrote Build System Rules and Algorithms, formalizing the algorithmic deficiencies with existing build systems and describing Tup’s implementation, a significant improvement over the status quo. I would like to document my analysis of Tup and whether I think it replaces ibb.

Before we get started, I’d like to thank Mike Shal for being receptive to my comments. I sent him a draft of my analysis and his responses were thoughtful and complete. With his permission, I have folded his thoughts into the discussion below.

Is Tup suitable as a general-purpose build system? Will it replace SCons or Jam or Make anytime soon? Should I continue working on ibb?

Remember our criteria for a scalable build system, one that enables test-driven development at arbitrary project sizes:

  1. O(1) no-op builds
  2. O(changes) incremental builds
  3. Accessible dependency DAG atop which a variety of tools can be built

Without further ado, my thoughts on Tup follow:


Tup defines its own declarative syntax, similar to Make or Jam. At first glance, the Tup syntax looks semantically equivalent to Make. From the examples:

: hello.c |> gcc hello.c -o hello |> hello

Read the dependency graph from left to right: hello.c is compiled by gcc into a hello executable. Tup supports variable substitution and limited flow control.

Build systems are inherently declarative, but I think Tup’s syntax has two flaws:

  1. Inventing a new syntax unnecessarily slows adoption: by implementing GNU Make’s syntax, Tup would be a huge drop-in improvement to existing build systems.
  2. Even though specifying dependency graphs is naturally declarative, I think a declarative syntax is a mistake. Build systems are a first-class component of your software and your team’s workflow. You should be able to develop them in a well-known, high-level language such as Python or Ruby, especially since those languages come with rich libraries. As an example, SCons gets this right: it’s trivial for me to write CPU autodetection logic for parallel builds in a build script if that makes sense. Or I can extend SCons’s Node system to download source files from the web.

Implementation Language

Tup is 15,000 lines of C. There’s no inherent problem with C, but I do think a community-supported project is more likely to thrive in a faster and safer language, such as Python or Ruby. Having worked with teams of engineers, it’s clear that most engineers can safely work in Python with hardly any spin-up time. I can’t say the same of C.

Git is an interesting case study: The core performance-sensitive data structures and algorithms are written in C, but many of its interesting features are written in Perl or sh, including git-stash, git-svn, and git-bisect. Unlike Git, I claim Python and Ruby are plenty efficient for the entirety of a scalable build system. Worst case, the dependency graph could live in C and everything else could stay in Python.

Scanning Implicit Dependencies

The Tup paper mentions offhand that it’s trivial to monitor a compiler’s file accesses and thus determine its true dependencies for generating a particular set of outputs. The existing implementation uses a LD_PRELOAD shim to monitor all file accesses attempted by, say, gcc, and treats those as canonical input files. Clever!

This is a great example of lateral, scrappy thinking. It has a couple huge advantages:

  1. No implicit dependencies (such as C++ header file includes) need be specified — if all dependencies come from the command line or a file, Tup will know them all.
  2. It’s easy to implement. Tup’s ldpreload.c is a mere 500 lines.

And a few disadvantages:

  1. Any realistic build system must treat Windows as a first-class citizen. Perhaps, on Windows, Tup could use something like Detours. I’ll have to investigate that.
  2. Intercepting system calls is reliable when the set of system calls is known and finite. However, there’s nothing stopping the OS vendor from adding new system calls that modify files.
  3. Incremental linking / external PDB files: these Visual C++ features both read and write the same file in one compile command. SCons calls this a SideEffect: commands that share a SideEffect cannot parallelize. A build system that does not support incremental linking or external symbols would face resistance among Visual C++ users.

And some open questions:

  1. I haven’t completely thought this through, but it may be important to support user-defined dependency scanners that run before command execution, enabling tools such as graph debugging.
  2. I don’t have a realistic example, but imagine a compiler that reads spurious dependency changes from run to run; say, a compiler that only checks its license file on every other day.

Stepping back, I think the core build system should not be responsible for dependency scanning. By focusing on dependency graph semantics and leaving dependency scanning up to individual tools (which may or may not use LD_PRELOAD or similar techniques), a build system can generalize to uses beyond compiling software, as I mentioned in my previous blog post.

Dependency Graph

Tup’s dependency DAG contains two types of nodes: Commands and Files. Files depend on Commands and Commands depend on other Files. I prefer Tup’s design over SCons’s DAG-edges-are-commands design for two reasons:

  1. It simplifies the representation of multiple-input multiple-output commands.
  2. Some commands, such as “run-test foo” or “search-regex some.*regex” depend on source files but produce no files as output. Since they fit naturally into the build DAG, commands are a first-class concept.

Build Reliability

Tup, like SCons, places a huge emphasis on build reliability. This is key and I couldn’t agree more. In the half-decade I’ve used SCons, I can count the number of broken builds on one hand. Sadly, many software developers are used to typing “make clean” or clicking “full rebuild” when something is weird. What a huge source of waste! Developers should trust the build system as much as their compiler, and the build system should go out of its way to help engineers specify complete and accurate dependencies.

Reliable builds imply:

  1. Changes are tracked by file contents, not timestamps.
  2. The dependency graph, including implicit dependencies such as header files and build commands, is complete and accurate by default.
  3. Compiler command lines are included in the DAG. Put another way: if the command used to build a file changes, the file must be rebuilt.

Tup takes a strict functional approach and formalizes build state as a set of files and their contents. (I would argue build state also includes file metadata such as file names and timestamps, at least if the compiler uses such information.) If the build state does not change between invocations, then no work must be done.

Tup even takes build reliability one step further than SCons: If you rename a target file in the build script, Tup actually deletes the old built target before rebuilding the new one. Thus, you will never have stale target executables lying around in your build tree.

Nonetheless, there are situations where a project may choose to sacrifice absolute reliability for significant improvements in build speed, such as incremental linking discussed above.

Core vs. Community

A build system is a critical component of any software team’s development process. Since every team is different, it’s essential that a build system is flexible and extensible. SCons, for example, correctly chose to implement build scripts in a high-level language (Python) with a declarative API for specifying nodes and edges in the dependency graph.

However, I think SCons did not succeed at separating its core engine from its community. SCons tightly couples the underlying dependency graph with support for tools like Visual C++, gcc, and version control. The frozen and documented SCons API is fairly high-level while the (interesting) internals are treated as private APIs. It should be the opposite: a dependency graph is a narrow, stable, and general API. By simplifying and documenting the DAG API, SCons could enable broader uses, such as unit test execution.


Like Tup’s author, I agree that build autoconfiguration (such as autoconf or SCons’s Configure support) should not live in the core build system. Autoconfiguration is simply an argument that build scripts should be specified in a general programming language and that the community should develop competing autoconfiguration systems. If a particular autoconfiguration system succeeds in the marketplace, then, by all means, ship it with your build tool. Either way, it shouldn’t have access to any internal APIs. Configuration mechanisms are highly environment-sensitive and are best maintained by the community anyway.

DAG post-process optimizations

Another argument for defining a build tool in a general-purpose language is to allow user-defined DAG optimizations and sort orders. I can think of two such use cases:

  1. Visual C++ greatly improves compile times when multiple C++ files are specified on one command line. In fact, the benefit of batched builds can exceed the benefit of PCH. A DAG optimizer would search for a set of C++ source files that produce object files in the same directory and rewrite the individual command lines into one.
  2. When rapidly iterating, it would be valuable for a build system or test runner to sort such that the most-recently-failed compile or test runs first. However, when hunting test interdependencies as part of a nightly build, you may want to shuffle test runs. On machines with many cores but slow disks, you want to schedule expensive links as soon as possible to mitigate the risk that multiple will execute concurrently and thrash against your disk.


Tup is a significant improvement over the status quo, and I have personally confirmed its performance — it’s lightning fast and it scales to arbitrary project sizes.

However, without out-of-the-box Windows support, a mainstream general-purpose language, and a model for community contribution, I don’t see Tup rapidly gaining traction. With the changes I suggest, it could certainly replace Make and perhaps change the way we iterate on software entirely.

Next, I intend to analyze prebake.

10 Pitfalls of Dirty Code

10 Pitfalls of Dirty Code

(Disclaimer: These are my opinions and not the opinions of IMVU or its founders. I’m sure we all have different perspectives.)

A History of IMVU’s Development Process

IMVU was started with a particular philosophy: We don’t know what
customers will like, so let’s rapidly build a lot of different stuff
and throw away what doesn’t work. This was an

effective approach
to discovering a business by using a
sequence of product prototypes to get early customer feedback. The
first version of the 3D IMVU client took about six months to build,
and as the founders iterated towards a compelling user experience, the
user base grew monthly thereafter.

This development philosophy created a culture around rapid prototyping
of features, followed by testing them against large numbers of actual
customers. If a feature worked, we’d keep it. If it didn’t, we’d
trash it.

It would be hard to argue against this product development strategy,
in general. However, hindsight indicates we forgot to do something
important when developing IMVU: When the product changed, we did not
update the code to reflect the new product, leaving us with piles of
dirty code.

Dirty Code

What makes code dirty? Really, anything that prevents people from making changes to the system. Examples include code with:

  • unclear or too many responsibilities,
  • overly complicated or obscure control flow,
  • concepts that don’t map to the domain,
  • too many dependencies,
  • global state,
  • or duplicated logic.

In short, if you hire someone who’s clearly smart and they say “I
can’t make sense of this”, then you probably have a dirty code

You’ll sometimes hear of a technical debt
metaphor. Technical debt is a way to think about the cost of
introducing dirty code as you’ll need to maintain it in the future.
Knowingly introducing dirty code lets you quickly test a hypothesis or
learn some information, so you’re not investing in code you won’t
need. For code you will need, however, technical debt compounds on
itself and rapidly becomes more expensive than the original cost of
fixing it.

Taking on technical debt can be the right decision, but it’s important
to remember that you’re introducing work for someone down the road.
Moreover, only programmers have a good grasp on the true cost of
technical debt. The business will never decide to refactor over
pursuing the next shiny project (they don’t have enough information).
Your engineering organization must be empowered to do what it thinks
best for the business and technology platform. If the term “technical
debt” ever shows up in a strategic plan, your engineering team has
failed to communicate the true costs of their work.

We used the technical debt metaphor quite a bit at IMVU, and, in
hindsight, it’s obvious that we underestimated the long-term costs by
at least an order of magnitude.

So what are the true costs of letting dirty code linger? Each of
these are drawn from real examples at IMVU.

Team Costs

  1. Dirty code does not scale to larger teams.

In a code base where modules and objects have unclear
responsibilities, programmers tend to implement features by modifying
code all over the system. This causes conflicts as multiple people
change the same files. Following the open-closed principle helps
here. Every object should do one thing well and have a clear
interface to the rest of the system. Ideally, each feature would fit
into its own object or set of objects, and plug into the system in a
standard way.

  1. Dirty code reduces team morale.

Most programmers I know get pleasure and validation by shipping code
that makes people happy. Any frustration that gets in the way of that
basic need reduces morale. If an improvement to the product seems
like it should be a three-hour task but takes two days of
investigation and pain, the programmer is unlikely to feel like they
can make a difference.

  1. Dirty code makes programmers slower.

If there are two systems for doing X, and you want to make an
improvement to the way X is done, you have to change both systems,
increasing effort and the risk of regression.

If the concepts in the domain don’t map to your objects, programmers
have to struggle to find the right place for new code.

If A and B are unrelated aspects of the system and the logic for A and
B are glommed together, changes to A involve understanding B too. The
more aspects are coupled together, the cost of changing each of those
aspects goes up.

  1. Dirty code inhibits the formation of an ownership culture.

When the code is too complicated for anyone to fit it in their heads,
programmers will tend to blame the legacy code or architecture for any
bugs or regressions that crop up. If they perceive it’s too expensive
to fix the architecture, they will not feel responsible if the product
ends up being low-quality. To build a sustainable, high-quality
product, the programmers ultimately need to feel responsible, and the
feedback loop between customers and programmers needs to be closed.

Product Costs

  1. If product concepts are not reflected in the code, programmers
    might implement features in ways that don’t make sense in the product.

To explain this, I’ll give an example from the IMVU client: The
business rules around product loading are complicated to begin with.
Worse, the code does not directly reflect said rules. Because of
this, our attempts to implement a better loading experience (including
progress bar and object prioritization) have failed multiple times,
and we still don’t have it quite right.

  1. Dirty code incentivizes the business to invest in tangential
    revenue work rather than attacking core business problems.

For most startups*, the primary product or core competency should
ultimately derive the most revenue. If management perceives it’s too
expensive to work on the core product, they will tend to fund
tangential work such as new payment methods or bizdev deals. There’s
a point where that kind of work makes sense (and pays for itself), but
the core offering should be the biggest lever, and the company should
rally around that.

* During Web 2.0, your mileage may vary.

Quality Costs

  1. Even with good automated test coverage, dirty code increases the
    risk of introducing regressions.

If a module has too many dependencies or responsibilities, changes to
it can have unintended consequences. Automated test coverage helps a
great deal, but think of test coverage as approximating (number of
tested conditions) / (number of possible conditions). The
combinatorially increased states in dirty code effectively reduces the
coverage of your automated (and manual!) tests, allowing regressions
to slip through to customers. Threads, if statements, and nullable
objects are all examples of ways to reduce test coverage from the

  1. Wide or unclear dependencies reduce the quality of tests.

In our experience, unit tests around dirty code tend to involve lots
of mocks, partial mocks, and monkey patching, reducing the likelihood
that tests will actually catch regressions. Worse, these tests are
more likely to fail after benign refactorings. As we refactor our
objects to better reflect the actual domain (updating the tests as we
go), our confidence in the tests improve and they become dramatically
easier understand.

  1. Dirty code hides real bugs.

If the code is too complicated for the programmers to understand and
has too many possible states to effectively test, what makes you think
you can know whether it reliably works? In our experience, every
dirty module corresponds with the area of the product that our users
report intermittently breaks. When refactoring those systems, we
inevitably discover race conditions, bugs in edge cases, and
performance problems.

  1. Dirty code gets dirtier.

Finally, if your team does not get used to improving dirty code, it
will only get worse. Eventually, the programmers (and maybe even
management) will start calling for a rewrite (a.k.a. resetting the
shitty counter). Software design and refactoring are skills that take
practice. Honing them will keep your business and technology nimble.

Final Thoughts

I’ve talked a lot about why dirty code is expensive, so you may be
asking “Well, what can I do about it?” First, try to pay attention to
your code. After you finish writing some, ask yourself “Could I make
this clearer?” Then ask your neighbor the same question. Beyond
that, here are some resources that will help you improve your design

It’s definitely possible to keep that feeling of ‘newness’ in your
code. Hopefully I’ve convinced that you that the extra few hours or
days to clean up after every project easily pays for itself. Code is
the lifeblood of our industry – keep it clean!

Refactor first!

Test-driven development teaches the red-green-refactor
mantra, which works wonderfully when starting a project from scratch.
Chances are good, however, that you’ll spend 80% of your programming
career working in existing codebases. In existing code, I’ve found it
easier to refactor first until the desired behavior change is a
30-minute task. By refactoring first, you’ll…

  • better understand the code you’re about to change.
  • better understand the consequences of your change.
  • make it easier for the person reviewing your change to understand
    its impact. (It’s much easier to review several independent
    refactorings and one simple change than a nonobvious change followed
    by refactorings; or, worse, all of the above in one commit.)
  • reduce risk and save time by preventing unexpected bugs.

Red-green-refactor is great, but try refactoring first!

Quality With A Name

As IMVU and its engineering team have grown, we’ve encountered an increasing amount of overhead while implementing features that affect the original code for the client and web site (you may even call some of it prototype code). You know, this effort even goes back to when I started. Most of my early days at IMVU involved fixing bugs, adding test coverage, and making small changes to the product. There were a lot of things about the code that I didn’t particularly like, but I chalked that up to the first time I’ve ever had to deal intimately with a large code base that I did not write. That doesn’t mean I didn’t try to improve the things that I found; in fact, it’s been an excellent exercise in articulating engineering practices I’ve internalized so deeply that I take them for granted.

At IMVU, we practice test driven development, which means that we always* add a unit test exposing a bug in the system before fixing the bug. In practice, this means we should never see the same bug twice. Some areas of the code have seen more love than others, of course.

Recently, Andy has been trying to implement a tiny new feature in the buddy window. The kind of change that should only take 15 minutes to write the tests and maybe another 15 to implement. (And we wouldn’t have to worry about other features breaking, or other changes breaking this feature, and all the other TDD goodness.) That is, if we have good tests in that area of the code. But we don’t, so it’s not that easy. So he gets to be the lucky engineer, adding tests to this module, which is involving a lot of refactoring. The code’s style has a lot of internal consistency, and there definitely is a design, but adding tests around it has proven to be a huge time loss. So we’ve been trying to articulate why the code is such a pain. But James Shore explains it exactly:

“…the goodness of a design is inversely proportional to the cost of change.”

And there we go. That sums it up.

* Ideally. We’re always improving.


Creating a lot of technical debt quickly is not exactly what “rapid development” is supposed to mean. — dnicolet99 on Scrum mailing list

Scrum is great for either fixed-date variable-scope, or “fixed-scope” (which always grows) variable-date. If you’re doing fixed-date fixed-scope, I recommend waterfall or RUP, which will buy you a few months to look for a new job. — Michael James

Agile Databases

There is a subgroup of the agile community focused on agile database development and database refactoring. Just do a search. At IMVU, we definitely do agile development of our databases and the application layer code that talks to them. We expect most new engineers to be capable of writing a database schema and pushing it to production within a week of starting, with automated tests that let us change things without fear of breakage, of course. We have no traditional DBAs, although one of our engineers acts as one, reviewing schemas before they’re applied in production.

The database refactoring resources that we have seen have been deficient in one area: they assume that you have small amounts of data, or the ability to take down your site or service so that you can apply schema changes. (Which lock tables until the schema is applied.) The problem that we have not seen addressed yet is how to change table structure when you have millions of customers and heavy usage. Each hour the site is down translates into lost revenue. Yeah, there are ways to get around this, such as applying schemas to slaves and then failing the master over to the slave, but these require a fair amount of infrastructure and pain to support.

But if I could choose one thing to get from “the agile database community”, it would be a modification to MySQL that let you add or remove indices or columns in the background, even if that particular table was under heavy use, and even if the table had degraded performance during the alter. Maybe it could be implemented as a table with special metadata that shows how to get the data from the old table if it does not exist in the new one. And maybe Oracle supports this already, and I just want a new feature in MySQL.

Update: It turns out that if you have partitioned your customers across enough databases, and have some spare capacity, you can apply schema changes across your databases in parallel, and it’s not so bad. So maybe that’s how to be agile and have users. :)

Test Driven Development: First Impressions

Thanks to Noel Llopisgreat series of articles, I’m sold on test driven development. I’d like to spend a little time recording my first impressions — after only one week of experience. For maximum benefit, I recommend reading Noel’s articles before or after continuing.

Disclaimer: the product I’m working on, a programming language and its compiler, is particularly amenable to test driven development it’s very straightforward to see whether the code is doing the right thing or not. It’s less obvious how to apply test-driven development to some other areas, specifically user interface design and computer graphics. Now let’s get to my observations.

First, test-driven development is fast. You would think that having to right twice as much code, maybe more, would mean that TDD is slow. But this doesn’t take into account the fact that you’re debugging time is reduced to almost nothing, and the act of writing tests is nearly equivalent to designing software on paper or in your head. You don’t have to worry about writing good, or even correct, code, as long as the tests pass. For example, my compiler only supports two operators right now. In even those are hacked for specific data types. But that’s okay, because in test-driven development there is a mantra: untested code doesn’t work. It’s just like in the field of security: unproven code is insecure. If you want to see the other myriad advantages of having tests for every piece of your code, there’s plenty of literature out there.

Second, test-driven development is fun. The “find insufficiency” – “write test” – “see it fail” – “change code” – “repeat until it passes” iteration cycle is extremely rapid. And after every change, you get to see all the other tests pass, so you know you didn’t break anything. Instant gratification. It’s like a drug. I got more done last night than I have in some weeks, just because I couldn’t stop. TestInfected for sure.

Third, test-driven development breeds good code. The logic here is pretty simple: your code needs to be easy to test, so your objects become easy to instantiate and easy to use. Noel elaborates on this.

I’m not sure I can back this up, but in order to effectively practice TDD, I highly recommind using a good build system. Visual Studio might not cut it. I’m using SCons and I find that I almost never do a build without running the tests (typing “scons test”) as well. However! This may not be a limitation of TDD, since having a good build system is a good thing anyway. ;)

Now for some details about my particular setup. As I mentioned, I’m using one of SCons’ newer features, the ability to associate actions with Alias nodes. How to do this. I’m using Boost Test and some Python scripts. I have a thin wrapper around Boost Test to make it a little easier to read. (#define TEST BOOST_AUTO_UNIT_TEST and such.) I’m not sure I know how to follow Noel’s rule of “one check per test” yet. Most of my tests depend on values created in previous tests. For the same reason, I’m not sure how to effectively use fixtures yet, since I want to test the fixture setup and teardown too.

Now go try it.

p.s. I used Dragon NaturallySpeaking to write this, so if you see any weird words or grammar, please forgive. I tried my best to correct what I can see.