I was recently involved with a number of developers who had been tasked with a Java programming assignment. The assignment was one of those hypothetical “take some input and generate some output” situations that was as much about how you would structure a Java application as it was about the actual logic that was involved.
Looking over the submissions for the assignment, the results broadly fell into two camps: put all code into a main class with static methods, and those who broke the code down into interfaces with tests.
One student asked why all that additional boilerplate code that was involved in creating interfaces and tests was worth the trouble. So I thought I would demonstrate why with a little assignment of my own.
For this assignment, you need to build an application that:
- Downloads XML from a HTTP server
- Converts the XML String to a Document
- Transforms the XML Document with a XSLT to include a new root element
- Print the resulting XML to the console
This first example satisfies the requirements with a single class and a single method.
This second example satisfies the requirements with multiple classes and interfaces.
Needless to say, the second implementation is far more verbose, and looks something like what a fifth year developer would produce. All jokes aside, why would anyone want add such complexity to their code?
Let’s assume that code is put into production, and many years later there is need to write a second application that does all the same processing as the first, but also uploads the transformed XML via a second HTTP operation.
The single class implementation will have to be copied and pasted into a new class, and some additional code added to perform the HTTP upload. This code has no granularity, so reuse can only be achieved by copy and paste.
Duplicating code like this the number one code smell that Martin Fowler calls out in his refactoring book:
Number one in the stink parade is duplicated code. If you see the same code structure in more than one place, you can be sure that your program will be better if you find a way to unify them.
On the other hand, even if you copied and pasted the main method from the multiple class implementation, you would be copying 5 lines of code. And even that small level copy copy-paste could be avoided with some minor refactoring.
Single methods that implement complex processes are exceptionally difficult to reuse without resorting to copy and paste, and copy and paste code is the devil.
Let’s assume that now there is a requirement to pretty print the resulting XML.
This is actually not a lot of work, as it only involves a few additional options to the transformer that converts a XML Document to a String.
If your code is referencing an implementation of the XmlService interface, then you have only one place where this change needs to happen. You can test the output as a result of your change in a unit test, and can implement the new functionality across all code that with a simple change in one location.
The single class implementation is not so easy to update. Thanks to the copy and paste reuse of code from a single class implementation, you now have hundreds of lines of code duplicated in your code base, and it is up to you to find each location where the code has been copied and update it. This is a time consuming and error prone process.
What happens when the web server is uncontactable? What happens if it returns invalid XML? These are all valid questions for production code, and unit tests are a very effective way to validate performance of code in edge case scenarios.
However, the single class implementation is very difficult to test, because all of the logic is in a single method, and it is impossible to test one aspect of the process without executing all the others. You couldn’t test the XML processing without first making a network call. You couldn’t test the networking code without also executing the XML manipulation.
When code is not decomposed into discrete units of functionality, testing is at best fragile, and at worst impossible.
How well will this code function when we are making thousands of requests per minute?
I can actually tell you from experience that making thousands of requests with the Apache HTTP client can lead to file handle exhaustion. This is the sort of issue you’ll usually only find out because you have done performance testing, or because your production system has gone down.
Because the single class implementation is difficult to test, odds are that you’ll only find out about scaling issues in production.
The multiple class implementation exposes the networking logic in a discrete class which can be tested independently of the main application. It is easy to set up a mock HTTP server and hit it with thousands of requests to see how you code handles the load.
So why would you want to spend time writing a whole bunch of interfaces and classes when a single class approach solves all your immediate problems? The answer is because a single class solution is not reusable, is untestable and is difficult to maintain.
Unfortunately all of these problems tend to show up only after the code has been put into production, and sometimes only years later. This can make it difficult to appreciate the importance of developing well structured code early on. But once you have experienced the pain of debugging or enhancing these single class implementations when the original developer has long since left the company, you’ll appreciate the elegance of small methods implementing interfaces with unit tests.