TDD for algorithms: the state of the art
The Agile Zone is brought to you in partnership with JetBrains. Learn how Agile Boards in YouTrack are designed to help teams plan, visualize and manage their work in an efficient manner, with support for both Scrum and Kanban processes.
In his example, Uncle Bob follows strict rules of evolution in his code for implementing sorting, and finally reaches quicksort instead of bubblesort. If you don't remember, a criticism of TDD is that it accepts naive solutions like bubblesort: it is normal trying to Test-Driven Development an ordering algorithm, and ending up with bubblesort or insertion sort, which have O(n^2) computational complexity, far from the optimum O(N logN), the boundary for comparison sorts.
I've yet to apply the Transformation Priority Premise on a real world problem however, and I'm waiting for independent confirmations. Today we will discuss about what goals you can reach with Test-Driven Development, by also citing some insightful references. We are trying to learn if TDD is useful for creating algorithms, or only as a discipline for coding something we already know how to do.
There has been a recent comparison of the two different Test-Driven Development schools: The Classic versus London TDD describes the classic school as an attempt to discover algorithms with incremental testing, while the London school always starts from a big picture, like an entire application.
When Classic TDD is illustrated, it's usually an example where an algorithm is fleshed out one test at a time. For example, if test-driving an implementation of, say, a program that converts integers into roman numerals, we might start with the simplest test "roman numeral for 1 is I". The simplest solution might be to return the hardcoded literal value "I". Then we move on to the next easiest failing test.
The dichotomy has been described as a fallacy as both styles are necessary during development: the Classic style for small, reusable parts, the London one for acceptance and integration testing.
However, it has been pointed out that practitioners of the Classic style often draws on tacit design knowledge, for example in the article Making too much of TDD by Michael Feathers:
The emergent design meme, for better or worse, became intimately woven into Test Driven Development. We found that, in the small, you could start developing a system without having a plan for its design, and, through the application of a set of rules and some reflection, arrive at very good designs. When I think back about this, I call this the "Look ma, no hands!" era of TDD. Critics would rightly point out that people who were doing this well were drawing on a lot of tacit knowledge of good design. For the most part, we agreed. We just felt that this knowledge of good design could be taught, and people could make continuous decisions across the development cycle to grow good design organically. You do have to bring your design knowledge to the table. Red/Green/Refactor is a generative process, but it is extremely dependent upon the quality of our input.
My take on this is that actually producing reusable algorithms is the job of the computer scientist or operations researcher; our job is to implement published algorithms, and to reuse existing implementations if they are too complicated (@xpmatteo would make the example of cryptography), without tying too much our design to libraries and frameworks.
What if we don't know how to make a test pass?
A personal experience of mine, for example, comes from Matlab and algorithms for multimedia algorithms like Harris or Canny feature detection, used to find interesting points on images. Apart from the difficulty of testing their results and defining them beforehand, they're difficult to code by incremental improvements too; however, when you can, you learn why the algorithms works in this way. A much better choice than barely memorizing it. The same could be said of sorting algorithms: after that old lesson in college, we often forget why quicksort and mergesort are faster.
Thus TDD is a tool for design, because it forces to define and focus on an interface rather than an implementation. But if you do not know how to design, listening to your tests will be painful as they would scream and either be ignored, or halt the development because you do not know how to fix the situation, and return to easy tests.
A hammer (what an overused metaphor) is useful, but if you start throwing it
around, not knowing how to hit the nail, you will make more harm
than benefit. That's why I write and write about patterns both for the production code and the testing code: they're a step above the single class in design. With TDD you can easily develop a single class, but won't feel pain very much even after having put some hundred lines in it. When it comes to divide the work in multiple classes, your spidey design sense should be start tickling, and instantly see if some pattern applies or it would be a bad idea.
Uncle Bob made the right point again on TDD a while ago, in a quote which I printed and hanged in my office:
Look, TDD is not my religion, it is one of my disciplines. It’s like dual entry bookkeeping for accountants, or sterile procedure for surgeons. Professionals adopt such disciplines because they understand the theory behind them, and have directly experienced the benefits of using them.
Like Jack in Lost, you can operate without the right environment in emergency conditions or when time is very short (for example, a bomb will go off if you do not compile something in the next minute). But will it be less or more difficult? Will it be safer? Will it be desiderable for the patient?
Surgeons follow strict procedures due to the risks involved, without inventing as they go along (I dare you to accept an emergent design for your bypass). But should you invent when doing development? I mean, how can you estimate how long will it take to prepare an application if you do not know how to code it? You do not know which pattern you will be using, how many classes there will be in this package, or what fields should the objects have. And that's right: it's far more flexible to decide this constraints after having got feedback from the code. But usually you have all the knowledge and experience necessary for making these choices.
Now imagine you're told to prepare an application that performs face recognition. Will you start from a test? Maybe, but it will stay red for a long time. In that case, after defining what I want with some informal user stories, I will start doing spikes, exploratory testing and walking skeletons to learn more about the problem.
Spikes are not test-driven, while walking skeletons have tests written on a very high-level and to test a design which is already fleshed out (in fact, they're used to discover if an architecture can work).
Only when I'm confident that I can make a test pass, I write it. But I have a rule: anything that goes under source control should be tested.
My final thought is that during development, the solution space is very large and there is an infinite amount of code which solves your problem. TDD drives us towards a solution which is maintainable, presents a good Api (since the test is its first example of client code) and minimal because you can easily try to delete a line of code and see if the tests are still green. TDD is desiderable, but you cannot think that by following blindly the Red-Green-Refactor cycle you will end up with a full-fledged application. It's only the basic workflow: you still have to do the thinking, and to gain the knowledge necessary.
In short, this is the line I would try to follow: to make this test pass. should I use a Map or a List? Uhm, first I have to learn what a Map and a List are, and the difference between them. Let's open a data structures book. I don't know how to make this test pass, and maybe after some experimentations and research I will have learned more; I will be able to write a better test, and to actually get to green.
Let me know what you think about developing algorithms with or without TDD in the comments.