10 Pitfalls of Dirty Code
10 Pitfalls of Dirty Code
(Disclaimer: These are my opinions and not the opinions of IMVU or its founders. I'm sure we all have different perspectives.)
A History of IMVU's Development Process
IMVU was started with a particular philosophy: We don't know what customers will like, so let's rapidly build a lot of different stuff and throw away what doesn't work. This was an effective approach to discovering a business by using a sequence of product prototypes to get early customer feedback. The first version of the 3D IMVU client took about six months to build, and as the founders iterated towards a compelling user experience, the user base grew monthly thereafter.
This development philosophy created a culture around rapid prototyping of features, followed by testing them against large numbers of actual customers. If a feature worked, we'd keep it. If it didn't, we'd trash it.
It would be hard to argue against this product development strategy, in general. However, hindsight indicates we forgot to do something important when developing IMVU: When the product changed, we did not update the code to reflect the new product, leaving us with piles of dirty code.
Dirty Code
What makes code dirty? Really, anything that prevents people from making changes to the system. Examples include code with:
- unclear or too many responsibilities,
- overly complicated or obscure control flow,
- concepts that don't map to the domain,
- too many dependencies,
- global state,
- or duplicated logic.
In short, if you hire someone who's clearly smart and they say "I can't make sense of this", then you probably have a dirty code problem.
You'll sometimes hear of a technical debt metaphor. Technical debt is a way to think about the cost of introducing dirty code as you'll need to maintain it in the future. Knowingly introducing dirty code lets you quickly test a hypothesis or learn some information, so you're not investing in code you won't need. For code you will need, however, technical debt compounds on itself and rapidly becomes more expensive than the original cost of fixing it.
Taking on technical debt can be the right decision, but it's important to remember that you're introducing work for someone down the road. Moreover, only programmers have a good grasp on the true cost of technical debt. The business will never decide to refactor over pursuing the next shiny project (they don't have enough information). Your engineering organization must be empowered to do what it thinks best for the business and technology platform. If the term "technical debt" ever shows up in a strategic plan, your engineering team has failed to communicate the true costs of their work.
We used the technical debt metaphor quite a bit at IMVU, and, in hindsight, it's obvious that we underestimated the long-term costs by at least an order of magnitude.
So what are the true costs of letting dirty code linger? Each of these are drawn from real examples at IMVU.
Team Costs
- Dirty code does not scale to larger teams.
In a code base where modules and objects have unclear responsibilities, programmers tend to implement features by modifying code all over the system. This causes conflicts as multiple people change the same files. Following the open-closed principle helps here. Every object should do one thing well and have a clear interface to the rest of the system. Ideally, each feature would fit into its own object or set of objects, and plug into the system in a standard way.
- Dirty code reduces team morale.
Most programmers I know get pleasure and validation by shipping code that makes people happy. Any frustration that gets in the way of that basic need reduces morale. If an improvement to the product seems like it should be a three-hour task but takes two days of investigation and pain, the programmer is unlikely to feel like they can make a difference.
- Dirty code makes programmers slower.
If there are two systems for doing X, and you want to make an improvement to the way X is done, you have to change both systems, increasing effort and the risk of regression.
If the concepts in the domain don't map to your objects, programmers have to struggle to find the right place for new code.
If A and B are unrelated aspects of the system and the logic for A and B are glommed together, changes to A involve understanding B too. The more aspects are coupled together, the cost of changing each of those aspects goes up.
- Dirty code inhibits the formation of an ownership culture.
When the code is too complicated for anyone to fit it in their heads, programmers will tend to blame the legacy code or architecture for any bugs or regressions that crop up. If they perceive it's too expensive to fix the architecture, they will not feel responsible if the product ends up being low-quality. To build a sustainable, high-quality product, the programmers ultimately need to feel responsible, and the feedback loop between customers and programmers needs to be closed.
Product Costs
- If product concepts are not reflected in the code, programmers might implement features in ways that don't make sense in the product.
To explain this, I'll give an example from the IMVU client: The business rules around product loading are complicated to begin with. Worse, the code does not directly reflect said rules. Because of this, our attempts to implement a better loading experience (including progress bar and object prioritization) have failed multiple times, and we still don't have it quite right.
- Dirty code incentivizes the business to invest in tangential revenue work rather than attacking core business problems.
For most startups*, the primary product or core competency should ultimately derive the most revenue. If management perceives it's too expensive to work on the core product, they will tend to fund tangential work such as new payment methods or bizdev deals. There's a point where that kind of work makes sense (and pays for itself), but the core offering should be the biggest lever, and the company should rally around that.
* During Web 2.0, your mileage may vary.
Quality Costs
- Even with good automated test coverage, dirty code increases the risk of introducing regressions.
If a module has too many dependencies or responsibilities, changes to it can have unintended consequences. Automated test coverage helps a great deal, but think of test coverage as approximating (number of tested conditions) / (number of possible conditions). The combinatorially increased states in dirty code effectively reduces the coverage of your automated (and manual!) tests, allowing regressions to slip through to customers. Threads, if statements, and nullable objects are all examples of ways to reduce test coverage from the code.
- Wide or unclear dependencies reduce the quality of tests.
In our experience, unit tests around dirty code tend to involve lots of mocks, partial mocks, and monkey patching, reducing the likelihood that tests will actually catch regressions. Worse, these tests are more likely to fail after benign refactorings. As we refactor our objects to better reflect the actual domain (updating the tests as we go), our confidence in the tests improve and they become dramatically easier understand.
- Dirty code hides real bugs.
If the code is too complicated for the programmers to understand and has too many possible states to effectively test, what makes you think you can know whether it reliably works? In our experience, every dirty module corresponds with the area of the product that our users report intermittently breaks. When refactoring those systems, we inevitably discover race conditions, bugs in edge cases, and performance problems.
- Dirty code gets dirtier.
Finally, if your team does not get used to improving dirty code, it will only get worse. Eventually, the programmers (and maybe even management) will start calling for a rewrite (a.k.a. resetting the shitty counter). Software design and refactoring are skills that take practice. Honing them will keep your business and technology nimble.
Final Thoughts
I've talked a lot about why dirty code is expensive, so you may be asking "Well, what can I do about it?" First, try to pay attention to your code. After you finish writing some, ask yourself "Could I make this clearer?" Then ask your neighbor the same question. Beyond that, here are some resources that will help you improve your design skills:
- Jeremy Miller's Blog
- Refactoring: Improving the Design of Existing Code (Addison-Wesley Object Technology Series)
- Refactoring to Patterns (Addison-Wesley Signature Series)
- Open/closed principle
- Liskov substitution principle
- Object-oriented design
It's definitely possible to keep that feeling of 'newness' in your code. Hopefully I've convinced that you that the extra few hours or days to clean up after every project easily pays for itself. Code is the lifeblood of our industry - keep it clean!
See discussion on Hacker News (2010).