Test Pyramid: Test Setup Best Practices

Here, I revisit the plain old test pyramid and will speak a lot about how one can structure the tests to make them more reliable and profitable.

Bartłomiej Żyliński

CORE ·

Jul. 25, 25 · Tutorial

Likes (2)

Comment

Save

1.4K Views

Testing our code is essential for maintaining the high quality of our code. In the long term, tests are crucial to ensure that we have maintainable software at all. Today, I will dive into the Test Pyramid and present a way to structure your tests to get the most out of them.

This is a revisited and revamped version of my original article on the Test Pyramid.

However, before we dive into the Test Pyramid, let’s take a look at the different types of tests that we have.

Tests Taxonomy

Unit Test: Simplest test intended to verify correctness for singular methods or functions in isolation.
Integration Test: Verify the interaction between different modules of our applications, usually one at a time, identifying issues at the interfaces between integrated parts.
E2E Tests: High-level tests that verify the whole flow correctness, from providing input to validating output on the opposite end. They validate if the application works well as a whole.
Smoke Tests: Very simple tests that run on an up-and-running system, usually just after deploying a new version, to ensure that the most critical features are working as expected — a kind of sanity check of our system.
Contract Tests: Validate if two sides of some arbitrary interaction are compatible with one another. They check whether the responses from one side of the interaction match the expectations of the other, and vice versa.
Performance Tests: This type of test verifies if the performance of our applications meets the requirements, usually done on a setup as similar to production as possible, and in the scope of the whole system.
Pen-Test/Security: A very diverse catch-all term for all the checks and tests that verify the security of our system.
Chaos Testing/Engineering: It is more of an approach than an actual test. It is aimed at testing system resilience by extreme measures. It works by introducing unpredictable but intentional and traceable failures into the working environment.

These are not all the types of tests out there, but the exact list depends on whom you ask and how far into categorizing you are willing to go. I believe that the types mentioned above are the most crucial ones, and we will focus on them in today’s text. I also believe that they are the reasonable ones.

Original Test Pyramid

The Test Pyramid is a concept used to describe the test setup to which a system should aspire, visually. It consists of different types of tests. The test types are sorted so that the base is represented by the test type of the highest quantity. Moving higher in the pyramid, each level is represented by the type with a lower number of tests in the overall set.

In my opinion, the best representation of this test pyramid is presented by Robert C. Martin in his book, The Clean Coder: A Code of Conduct for Professional Programmers.

Basically, we should have a high number of unit tests as a base, though having only a small set of integration and E2E tests. Performance and security tests are included under System tests.

This approach has a few good points, like:

Fast and cost-effective feedback: Unit tests are fairly easy to set up and, at least by the book, should run quickly, reducing the feedback loop for the developer.
It is CI/CD friendly: Having fewer complex tests, like E2E and integration tests, promises that it will be simpler to set up CI/CD jobs. Besides, CI runs faster with fewer integrations and E2E tests.
Reliability: Unit, component, and integration tests are less flaky and less complex than full E2E tests. Thus, we have smaller chances of any non-deterministic errors while introducing new tests and/or changing our test environment.

Additionally, as a whole, the Test Pyramid provides a clear and ready framework on how one should structure tests to get a more reliable system.

Still, while having all these benefits, it is not free of drawbacks, which I will describe in the following paragraph.

Why It Is Not Enough

Well, the first and most important problem in terms of the original test pyramid is the over-reliance on Unit Tests. Such over-reliance introduces a set of problems to our application:

Striving to have a high coverage of unit tests in your applications may not necessarily be a good idea. While fast and easy to build, it is very easy to dig too deep into unit testing your code. In such a case, any further changes related to this component may require a lot of additional work.
Unit tests are not suitable for every project life cycle phase; sometimes, even writing proper unit tests may not be possible at all, thus you will have to heavily rely on mocks.
The current shape of the pyramid can give a false sense of security, as you have a small number of tests that actually test the “living, breathing” system. While on the unit and integration levels, all things may appear right, they may not work correctly as a whole unit.
In its current shape, we do not have a large space for non-functional tests, like security tests or performance tests. It also does not mention contract or smoke tests.

Last but not least, remember that the test pyramid is a concept, and as with every concept, there is no need to blindly adhere to it if you do not see the sense in it. Remove one layer or more of the pyramid if it does not make sense for you.

Test Pyramid Per Use Case

If the original test pyramid is not enough and I still want to have some guidelines for tests, what then? Well, let’s throw the test pyramid away and just make a priority list of tests. Let’s iterate from the most to the least important type of tests that you need to have. Additionally, let’s make it on a case-by-case basis.

Change Heavy

Let’s start with the heavy case. It does not have to be a startup; it can be any type of greenfield or just a new service. Well, here you can go with even zero tests; you probably need velocity and quick customer feedback, not tests. You need freedom to break stuff and rebuild it quickly, not rewrite all the tests from the ground up.

Here, I would recommend focusing on E2E tests for paths that are the most crucial for you. Paths that are your main selling points and competitive advantages. While problematic in case of need for more velocity, I believe such a setup will benefit you the most, and will give you feedback on the operation of your most important parts.

I would recommend some unit tests if you have some algorithm-heavy or complex logic inside your codebase, especially if it is crucial for your operations and impacts customers directly.

Moreover, I would suggest doing some performance tests before going live — going viral on day one in this way is probably not a desired result.

If, by some miracle, you still have time to spare, set up some monitoring for the service. Trust me, it will be worth the time and effort.

Stable

Opposite to the change-heavy API, where everything may need to be changed and rewritten from scratch, here we have a system without such events, at least not frequently. We have infrequent changes, or the change impacts only a small subset of features.

In such a case, I would recommend going into the following structure: required integration tests, E2E tests, smoke tests, maybe security and performance tests, and consider contract tests if you are exposing an API.

Following such a structure will give you:

Real-life guarantees as to your system’s operations
Freedom to change the underlying implementation without the need to change your tests
A tool for finding problems in your integrations with 3rd party providers
A tool to quickly ensure your system is working correctly after deploying it
A lot of insight from security and performance tests

Service-Oriented Architecture

This case is kind of a tricky one, as different services may be owned by different teams, and in general, it should be their decision how they want to test their component. However, I believe that there should be a recommendation or best practice to have contract tests for every component that exposes any type of API. Thanks to following this, you will have extract guarantees after any type of change in one of your services.

If your design is mature enough, you can try introducing chaos engineering and see what results it will yield. System-wide pen tests can also be a good idea, better done collectively rather than individually. Some additional problems may occur in the service as a whole.

Besides that, I would recommend having system-wide requirements for observability — maybe some preset dashboards, alerts, and system-wide best practices. I think it will provide the teams with some frameworks they can easily adapt for their unique cases.

As for the individual services, I would not recommend anything specific; pick the tests that suit your use case the best.

Monolith

This case is a combination of all the previous ones. I recommend choosing your approach based on how frequent the changes are and what is changing. Remember to take into consideration the coupling between different components inside the monolith.

If you frequently change the inside of the monolith, not the interface, then go for E2E tests. On the other hand, if you frequently revise the API, then go for whatever is closer to the unit tests you can get. Do the same if you cannot set up E2E in any way, or if it is too complex to be actually worth it.

If there is a high coupling between different components, or the boundaries between them are blurry, maybe try writing something akin to “E2E tests” on a higher component level.

If it is not there yet, try to set up well-defined logs, metrics, and possible alerts, as close to a per-component basis as possible.

Test Pyramid Common Parts

Besides the structures I mentioned earlier, there are a few additional tools that can help you build more reliable systems. Not all of them are mandatory — maybe besides monitoring (this one, in my opinion, is a must-have). Pick the ones that you think will help you.

However, try to think through all of them; I believe that it will be time well spent, nevertheless.

Performance Tests

While not all systems and modules have strict performance requirements, it may be beneficial to have some performance tests.

We can provide additional insights for our product or business:

We know how far we can scale if the need arises at some point.
We can notice that some feature negatively impacts our performance.

I know it may not be the most crucial part for non-critical systems. However, at least we know about the issue and can make a decision on what to do with it instead of just letting it through.

Pen-Tests/Security Tests

Again, as with performance tests, not all services and systems require these. Nevertheless, it may be beneficial to at least entertain the idea. You may find some interesting insights along the way. The exact scope and scale greatly depend on a number of various factors. If you want to know more about security, I have written on this topic in more detail elsewhere.

ArchUnit Tests

I think that for all four cases, it may be worth trying to write some tests in ArchUnit fashion at least when your code structure stabilizes. While it may seem like a wasted time, it will for sure help you keep your code in shape for longer.

Observability

Tests are not the only thing that you will need to create robust systems. The whole infrastructure part around your system may be even more crucial than the tests in ensuring the flawless operation of your systems.

As an addition to your tests, you should also have good logging, metrics, and possibly alerts. They will give you additional insight into the operations of your systems. They will also polish some rough edges around your tests and may help identify some bottlenecks not caught in the tests.

Chaos Engineering/Testing

Probably the most complex concept to implement correctly. Deliberately introducing any type of disruptions or failures into an otherwise perfectly working system seems not the brightest idea. It can help identify weaknesses and problems that will not show up in any other case.

However, this type of “test” is very, very complex. Introducing failures—no matter if they are intentional or not — is never fully safe. Before going head-on with this, double-check that your software and infrastructure are actually ready to live it through.

Test Pyramid Trade-off and Considerations

Before we jump to the conclusion, there are a couple of trade-offs and assumptions that I think you should take into consideration while picking the tests that you want to use:

Time Limits

One of the considerations when picking which tests to focus on is time restrictions. If you have very strict limitations on how long your tests can run, then focusing on unit tests and some integrations would be better than going for a full E2E test set, and vice versa.

Integration Tests

In my opinion, a database is not a good case for integration tests nowadays. Integration tests should be used only for 3rd-party services that have complex behavior and cannot be easily tested in E2E tests.

If you have such dependencies in your system, then that is, in my opinion, the only valid point to write integration tests. The database layer can be tested in the E2E test layer.

Unit Tests

I believe that unit tests should only cover the algorithm/logic-heavy pieces of code. There is no point in trying to reach higher coverage tiers with unit tests. In my opinion, it is better to focus on E2E tests. Sometimes, especially for poorly designed architectures, writing actual unit tests is much harder than it looks.

Setup Complexity

In some cases, it may not be an option to create E2E or unit tests. In such a case, pick the one that is easier to set up and maintain and gives you more reliability. It may be reasonable to change your architecture/design to be more testable

Over-Reliance on Mocks

While writing any type of test, be careful not to overuse mocking and/or stubbing. You can easily start testing mock and stub behaviors instead of the actual code.

Test Implementation

For unit tests, do not go too deep into testing your behavior. Try to test interfaces, not the content of your methods. For E2E tests, try to use as many of the actual components as you can. Do not write your own stubs until you have to; testcontainers may come in very handy here.

Summary

Let’s start with a table to show concepts from previous paragraphs in a clear and concise manner.

Per Type Of Environment You Want To Run Your Tests

Type	Base	Optional
Change Heavy	- E2E for crucial parts of the API - Good observability pipeline (from logs to alerts) - Smoke test for crucial paths	- Performance tests for crucial parts - Security tests - Unit tests for logic/algorithm-heavy parts - Integration tests for 3rd party service
Stable	- E2E - Good observability pipeline (from logs to alerts) - Integration tests for 3rd party service - Unit tests for logic/algorithm-heavy parts	- Performance tests for crucial parts - Performing Security tests - Consider if you need Smoke Tests and their scope
Service Oriented Architecture	- Contract test for services exposing an API used by other services - Choose the exact test setup on per per-service basis - Design some base observability approaches for each team to adopt and extend	- System-wide Performance tests - System-wide Security tests - Consider Chaos Engineering
Monolith	- Pick the tests that are easier to set up and maintain - Good observability pipeline (from logs to alerts) - Smoke Test	- System-wide Performance tests - System-wide Security tests

Per Test Type

Test Type / Environment	Change Heavy	Stable	Service Based	Monolith
Unit	No	Logic-heavy methods	Per service basis	Depends on the setup cost
Integration	Consider 3rd third-party service	3rd party service	Per service basis	3rd party service
E2E	For critical paths	Mandatory	Per service basis	Depends on the setup cost
Contract	No	When and where applicable	Recommended for all services	No
Performance	For consideration	Yes	Per service basis	System wide
Smoke	Consider for critical path	Consider for critical path	Per service basis	Consider for critical path
Security	For consideration	Yes	System wide	System wide
Observability	Yes	Yes	Predefined rules	Yes

It is not a perfect silver bullet for every case — there is no such thing or recommendation. Everything here is based on different trade-offs, some of which are mentioned in the paragraphs above.

My final recommendation is: Just write the best tests that you can, given your design and possibilities.

Thank you for your time.

API systems unit test

Published at DZone with permission of Bartłomiej Żyliński. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending