When writing unit tests for new code or existing code, it's often tempting to call test-only methods to get at private implementation details.

Here I will argue that test-only methods are a mistake and that it's critical to distinguish between testing code and testing the interface's behavior.

As an example, let look at a simple Python thread pool implementation.

class ThreadPool:
    def __init__(self):
        self._queue = Queue.Queue()
        self._threads = []
        self._activeThreads = []

    def run(self, fn):
        self._queue.put(fn)
        if len(self._activeThreads) == len(self._threads):
            t = threading.Thread(target=self.__thread)
            t.setDaemon(True)
            t.start()
            self._threads.append(t)

    def __thread(self):
        while True:
            q = self._queue.get()
            self._activeThreads.append(None)
            q()
            self._activeThreads.pop()

How would I unit test such a thing? Maybe something like:

def test_queuing_a_job_starts_a_thread():
    t = ThreadPool()
    t.run(lambda: None)
    assertEqual(1, len(t._threads))

That test would pass! But it's also a bad test for several reasons:

  • It doesn't assert that the job is run, meaning the test could pass if the implementation is broken.
  • It assumes implementation details: that the number of threads increases to one the first time a job is added.
  • Finally, the test has not protected itself from valid refactorings. If someone renamed or eliminated the private _activeThreads variable, the test would fail, even if the thread pool still behaved as advertised.

And there's the important point: behaved as advertised. Tests should verify that the object does what it says it does. That is, an object does work through its public interface, either by producing a side effect or returning a value.

Let's write a better test for the same functionality.

def test_queueing_a_job_starts_a_thread():
    begin = threading.Semaphore(0)
    end = threading.Semaphore(0)
    calls = []
    def job():
        begin.acquire()
        calls.append(None)
        end.release()
    t = ThreadPool()
    t.run(job)
    begin.release()
    end.acquire()
    assertEqual(1, len(calls))

This test no longer needs to reference any private fields or methods: it simply asserts that the given job is run and that said job does not block the main thread.

An aside: a simple implementation of run could be def run(fn): fn() but this test would hang (and thus fail) if fn was run on the same thread.

Writing tests against a public interface is validation for your object's public interface. It requires you to understand the object's interface. It shows examples for how to use your object elsewhere in the production system.

It also means the implementation of the object's invariants are protected: refactoring internals will never break tests unless the object itself no longer meets its public interface.

What if the code under test is too hard to reach through the public interface?

And here we come to a common argument in support for testing internals: "What if the code under test is too hard to reach through the public interface?" Well maybe your object is too complicated! Split the object into multiple objects with simple public interfaces and compose them.

Otherwise objects will end up with combinatoric invariants. Imagine the implementation of list.append. It probably has a typical fast path plus the rare occasion where it must reallocate the list's memory.

Note that the ThreadPool class doesn't implement its own list allocation logic (nor a thread-safe queue): it reuses existing objects with clear and sensible interfaces.

Thus the threadpool tests can simply ignore the implementation of list.append and assume it works. list.append is tested elsewhere.

Refactoring is Hard Enough

Refactoring is hard enough. It's critical that you build your systems to allow refactoring of internals. Any bit of friction, like incorrectly failing tests, may get in the way of engineers improving the code.

But what about caching and performance?

Caching and performance are an interesting case. Imagine a perspective camera object in a 3D renderer. It takes field-of-view, aspect ratio, and perhaps near and far distances as inputs and spits out a projection matrix.

As an optimization, you may be tempted to cache the projection matrix to avoid computation if it's requested multiple times with the same input. How do you test such a cache?

# pseudocode
class Camera:
    # ... setters and such go here

    def getProjectionMatrix():
        if _projectionMatrixDirty: recalculate()
        return _projectionMatrixCache
    bool _projectionMatrixDirty
    Mat4f _projectionMatrixCache

It's tempting to set inputs and assert _projectionMatrixDirty or perhaps assert the contents of the _projectionMatrixCache field.

However, these unit tests would not actually assert the desired behavior! Yes, you can maybe assert that you're caching data correctly, but nobody actually cares about that. Instead you want to test the motivating intent behind the cache: performance.

Try writing performance tests! Call getProjectionMatrix() multiple times and assert it takes less than N microseconds or cycles or whatever.

In some cases it may even be faster to recompute the matrix than cache it.

tl;dr

Here's my rule of thumb: if a method on an object is only called by tests, it's dead. Excise it and its tests.

Be rigorous and clear with your public interfaces. Protect your code with tests. Don't let internal object refactoring break your unit tests!