EDIT 2015-11-02: I added a couple more nice-to-haves which are I think are pretty important. See the end of the list.

I've used half a dozen unit testing frameworks, and written nearly that many more. Here is the set of features that I consider a requirement in any test framework.

You'd think some of the following requirements are so obvious as to not need mentioning... I, too, have been shocked before. :)

Must-Haves

  • Minimal boilerplate. Writing tests should be frictionless, so setting up a new test file should be little more than a single import and maybe a top-level function call.

  • Similarly, each test name should only have to be uttered once. Some frameworks have you, after writing all of your tests, enumerate them in a test list. I'm even aware of a Haskell project that repeated each test name THREE times: once for the type signature, once for the test itself, and once for the test list.

  • Assertion failures should provide a stack trace, including filename and line number. There are test frameworks that make you name each assertion rather than giving a file and line number. Never make a human do what a computer can. :)

  • Assertion failures should also include any relevant values. For example, when asserting that x and y are the equal, if they are not, the values of x and y should be shown.

  • Test file names and cases should be autodiscovered. It's too easy to accidentally not run a bunch of tests because you forgot to register them.

  • The framework should support test fixtures -- that is, common setup and teardown code per test. In addition, and this is commonly missed, fixtures should be nestable: test setup code should run from the most base fixture to the most derived fixture, then all the teardown code should run in the reverse order. Nested fixtures allow reusing common environments across many tests. The BDD frameworks tend to support that because nested contexts are one of their selling points.

  • It should be possible to define what I call "superfixtures": code that runs before and after each test, whether or not the test specifies a fixture or not. This is useful for making general assertions across the code base (or regions thereof), such as "no test leaks memory".

  • Support for abstract test cases. Abstract tests let you define a set of N tests that each operate on an interface and a set of M fixtures, each providing an implementation of that interface. This runs M*N tests total. This makes it easy to test that a bunch of implementations all expose the same behavior.

  • A rich set of comparison operators. For example, equality, identify, membership, and string matching. This allows tests to provide more context upon failure, but also makes it easy for programmers to write good appropriate and concise tests in the first place. (Bonus points: there are frameworks like py.test that have a single assertion form, but examine the assertion expression to automatically print any relevant context upon failure.)

  • Printing to stdout should be unbuffered and interleaved properly with the test reporter's output. I only include this because Tasty utterly fails this test. :)

is same as not caching:                OK (1.45s)He[lmo! [ 3T2h;i2s2 mis
 a [vmery[ 3i7n;n2o2cmen t   p u+t+S+t rOLKn,.  p aIs sheodp e1 0i0t  tdeosetssn.'
t a[fmfe c tt etshte  etxetsetr noault pmuetm.o
ize happy path:      OK

Nice-to-Haves

  • Customizable test reporting. There are two reasons. The first, colored test output, is a nice-to-have, but it's a huge one, as it probably shaves a few seconds off of each visual scan of the test results. Also, integrating test output with continuous integration software is a big win too.

  • Parallelism. The built-in ability to run tests in parallel is a nice way to reduce testing turnaround time. Either opt-in or opt-out parallelism are okay. But, if necessary, it's easy to work around the lack of parallelism and make efficient use of test hardware by dividing up the tests into even slices or chunks and running them on multiple machines or VMs.

  • Property-based testing, a la QuickCheck. While QuickCheck is amazing, and property-based testing will change your life, the bread and butter of your test suite will be unit tests.

  • Direct, convenient support for disabling tests. Without this capability, people just comment out the test, but commented-out tests don't show up in the test metrics, so they tend to get forgotten. Jasmine handles this very well: simply prefix the disabled fixture or test with "x". As in, if a test is spelled it('should return 2', function() { ... }), disabling it as easy as changing it to xit.

I could build the feature matrix across the test frameworks I've used, but only a handful are complete out of the box. (If anyone would like me to take a crack at filling out a feature matrix, let me know.)

The Python unit testing ecosystem is pretty great. Even the built-in unittest package has almost every feature. (I believe I had to manually extend TestCase to provide support for superfixtures.) The JavaScript world, until recently, was pretty anemic. QUnit was a wreck, last time I used it -- there is no excuse for not including stack traces in test failures. Jasmine, on the other hand, supports almost everything I care about. (At IMVU, we ended up building imvujstest, part of imvujs.)

In the C++ world, UnitTest++ comes very close to being great. The only capabilities I've had to add were superfixtures, nested fixtures, and abstract test cases. In hindsight, I wish I'd open sourced that bit of C++ macros and templates while I could have. :)

go test by itself is way too simplistic to be used for a sophisticated test suite. Fortunately, the gocheck package is pretty good. It's possible to make abstract tests work in gocheck, at the cost of some boilerplate. However, today, gocheck doesn't support nested fixtures. I suspect they'd be amenable to a patch if anyone wants to take that on.

The Haskell unit testing ecosystem is less than ideal. Getting a proper framework that satisfies the above requirements takes considerably more effort than the other examples I've given. Everything I've described is possible with HUnit and various Template Haskell packages, but it takes quite a lot of package dependencies and language extensions. I have dreams of building my ideal Haskell unit test framework... perhaps the next time I work on a large Haskell project.

If you're building a test framework, the most important thing to focus on is a rapid iteration flow: write a test, watch it fail, modify the code, watch the test pass. It should be easy for anyone to write a test, and easy for anyone to run them and interpret their output. The faster you can iterate, the more your mind stays focused on the actual problems at hand.

EDIT: More Nice-to-Haves

  • Copy and paste test names back into runner. It's pretty common to want to run a single test again. The easiest way to support this is to allow the exact test name to be passed as a command line argument to the runner. Test frameworks that automatically strip underscores from test names or that output fixture names in some funky format automatically fail this. BDD frameworks fail this too because of their weird english-ish test name structure.
  • Test times. Tests that take longer than, say, one millisecond should have their running times output with the test result. Test times always creep up over time so it's important to keep this visible.