Don't rely on end-to-end tests: design for failure instead.
End-to-end testing a complex application with external dependencies is costly and slow. Spending that effort on designing for failure and error handling makes better business sense.
Join the DZone community and get the full member experience.Join For Free
We typically understand software testing by the everyday definition of the word: making sure a piece of software performs the way it is supposed to in a production-like environment. For a complex distributed application with several external dependencies there is nothing that can beat a full end-to-end test. Or is there? To make it clear first: an end-to-end test is not there to ensure the integrity of your own software components, but to make sure that they cooperate well with everything that is outside your sphere of influence. The first category would fall under integration testing, and you definitely need that. Since external dependencies are unreliable by nature you would think they also merit very thorough testing. In practice, however, you will benefit more from building for failure than trying to simulate every possible fault scenario. If, say, your automatic process relies on a manually edited Excel sheet on a Windows share and someone left it open during the weekend – I’m not making this up – you’re toast anyway. Things will go wrong in production. Since you cannot prevent them, better be prepared. Here's a real-world example.
At irregular intervals (usually six times a year) team X produces a varying number of tab-separated files for team Y to process in a classic extract/transform/load (ETL) process, supplied in a zip file from an HTTPS address using Basic Authentication (i.e. with a username and password). A scheduled process downloads this file daily and compares if the MD5 digest of the content differs from the last processed run. If so, the database is updated. Usually nothing has changed so nothing happens. This is a textbook case of an external dependency where literally anything could go wrong, since neither the server nor the contents or the file are under our direct control.
- The webserver is not available; no HTTP response.
- The URL is not known with the server (HTTP NOT FOUND error).
- We’re not allowed to access it (UNAUTHORIZED of FORBIDDEN).
- There’s something wrong with our request headers (BAD REQUEST).
- The file is corrupt, empty, or otherwise unreadable as a zip file.
- The files are not the ones we expected; they must adhere to a strict naming scheme.
- The content is malformed; number and names of columns incorrect.
- Content is correctly formed but leads to excessive number of rejections by the database, e.g. integrity constraints.
I could go on, but you quickly see the three categories of errors and how to handle them. You have network errors, content errors, and processing errors (ETL logic and the database). Robust error handling means that the people who should handle that particular error are notified unambiguously. A quick and dirty solution built around the happy flow scenario that simply dumps any error in a generic error log will cost you dearly in having to plough through meaningless stack traces: not only an unpopular chore, but it makes bad business sense to waste expensive developers’ time that way.
Now you could choose to cover all the above scenarios in tests on a production-like environment, but it’s not easy to implement and will add considerable runtime to your test suite. Unit tests with stubs and mocks are a better option, provided your code is well-organized. Tests of the network code are not interested in the contents of the file. Validation of the zip contents does not rely on downloading it: you just use a local file. Checking the integrity of the CSV contents does not rely on the zip format.
Whether you need all this robustness is a question of statistical probability in combination with scale. In reality the above process hardly ever ran into trouble. But we had hundreds of such processes, so failures did happen on a daily basis and more hardening was certainly worth it. Implementing more robust error handling is also perfectly scalable. Think about intelligent wrappers around your network, ETL, and database code that can send useful messages to the responsible departments and persons. Another beneficial effect of having mistakes quickly routed and solved is that they don’t have to be flagged as a generic (read: alarming) incident, which is better for everyone’s peace of mind.
Opinions expressed by DZone contributors are their own.