How to Interpret the Number of Spring ApplicationContexts in Integration Tests
When optimizing Spring Boot integration tests, developers often focus on obvious metrics, but they do not always explain why an integration test suite is slow.
Join the DZone community and get the full member experience.
Join For FreeWhen optimizing Spring Boot integration tests, developers often focus on obvious metrics: total build time, test execution time, CPU usage, memory consumption, or the number of failed tests. These metrics are useful, but they do not always explain why an integration test suite is slow. One of the most important hidden metrics in Spring Boot integration testing is the number of distinct ApplicationContext instances created during the test run, check out my other article.
Spring’s TestContext framework can cache and reuse ApplicationContext between test classes, but only if the effective test configuration is the same. If the configuration differs, Spring has to create another context. In large enterprise applications, this can become expensive very quickly.
- How can the number of contexts correctly interpreted?
- If a test suite creates two contexts, is that good?
- If it creates six contexts, is that acceptable?
- If it creates twenty contexts, is that already a design smell?
- And most importantly: where should such a judgment come from?
Spring itself does not define a universal threshold for a “good” or “bad” number of cached ApplicationContext instances. However, the official documentation explicitly points out that a large number of loaded contexts can make a test suite unnecessarily slow. This means the number of contexts is not just an implementation detail. It is a relevant diagnostic signal.
This article explains how I derived a practical interpretation table for a real-world Spring Boot integration test suite and why such a table should be understood as a case-study heuristic, not as a universal Spring Framework rule.
Test Grouping Is a Valid Concept
General testing research supports that tests can be grouped by similarity, cost, coverage, or runtime behavior. This is highly relevant for Spring Boot integration tests. In Spring Boot integration testing, MergedContextConfiguration may be interpreted as one practical grouping dimension: tests with the same effective Spring configuration belong to the same context group.
In this case, similarity means shared Spring test configuration. That does not mean all tests should use the same context. It means that tests should not accidentally create different contexts when they are actually testing under the same architectural conditions.
Spring’s Context Cache as a Framework-Specific Grouping Mechanism
Spring Boot integration tests are not plain unit tests. They often require infrastructure such as dependency injection, database configuration, security configuration, web layer configuration, mock infrastructure, external API clients, messaging components, or tenant-specific setup.
Spring’s TestContext framework handles this through the ApplicationContext.
The framework can reuse a context if the effective configuration is the same. The cache key is based on configuration parameters such as configuration classes, active profiles, property sources, context customizers, initializers, and other test context settings. Spring’s documentation describes this context caching mechanism and explains that contexts can be reused when the same unique context configuration is encountered again. Let me explain.
Two tests may look similar to a developer but still produce different contexts if they use different profiles, properties, mocks, or imported configuration classes. They should normally produce separate context groups. For example, a database-focused test and a test involving an external OData destination may have different infrastructure requirements.
In that case, a separate context is not a problem. It reflects a real test configuration group. When every test class introduces a slightly different property, mock, or configuration import without a strong technical reason. Then the number of contexts grows not because the architecture requires it, but because the test suite has configuration drift.
Why Multiple Contexts Can Be Legitimate in Enterprise Applications
Spring Boot itself supports different testing styles. The documentation describes @SpringBootTest for loading the application context through SpringApplication, and it also provides more focused test annotations for specific slices of an application. Spring Boot’s test slices include annotations such as @WebMvcTest, @DataJpaTest, @JsonTest, and others. These annotations intentionally load only selected parts of the application and import different auto-configurations depending on the target slice.
Besides the Spring documentation, many community blogs report that different enterprise systems may have separate integration test groups, such as database-focused tests, web/controller tests, security-related tests, and so on. So, the goal should be to minimize unnecessary context fragmentation while preserving justified test configuration groups, instead of forcing the entire integration test suite into one ApplicationContext.
From Test Grouping to a Context-Count Heuristic
Based on this reasoning, I used the following interpretation in a case study:
- 1-3 application contexts show excellent context reuse,
- 4-8 are acceptable if justified,
- 10+ should be investigated, and a signal of a fragmented test configuration.
Let's discuss the numbers.
1-3: The most integration tests share the same effective configuration. For example:
Context 1: default integration test context
Context 2: database-specific context
Context 3: external-system-specific context
Such a structure is usually easy to understand. It suggests that the team has standardized its test profiles, properties, and infrastructure setup.
4-8: This is consistent with broader software-testing research, where test suites are not treated as one homogeneous block. They are often optimized, selected, prioritized, or clustered according to meaningful technical criteria such as coverage, execution cost, change relevance, or runtime behavior. For example:
Context 1: default SpringBootTest context
Context 2: database-heavy context
Context 3: external API integration context
Context 4: security-specific context
Context 5: multi-tenant context
Context 6: messaging context
Context 7: no-external-destination context
Context 8: migration-specific context
10+: Once the number of contexts reaches double digits, investigation becomes worthwhile. This does not automatically mean the test suite is badly designed. Community articles on Spring test optimization show that a very large enterprise platform with many modules, tenant variants, data stores, messaging systems, and external integrations may legitimately require more contexts. So, the number 10+ is not firm, but suggests that the risk of accidental fragmentation becomes higher.
Conclusion
Test grouping is a recognized concept in software-testing research. Large test suites are often optimized through minimization, selection, prioritization, and clustering. These techniques are based on the idea that tests have different costs, purposes, coverage, runtime behavior, and relevance. For Spring Boot integration tests, context reuse is a framework-specific grouping criterion. (Use the method of test grouping to create Spring application contexts)
Tests with the same effective MergedContextConfiguration belong to the same context group and can share the same cached ApplicationContext. Tests with genuinely different infrastructure needs may require different contexts.
Therefore, the goal is not to reduce every enterprise test suite to a single context. The goal is to distinguish between justified test configuration groups and accidental configuration fragmentation.
The shown numbers are a practical case-study heuristic, and not universal.
But the underlying principle is robust: A small number of well-defined context groups is healthy, but a growing number of slightly different contexts is a performance smell. That principle connects Spring’s TestContext cache mechanism with a broader idea from software-testing research: large test suites should be structured intentionally, not allowed to fragment accidentally.
Opinions expressed by DZone contributors are their own.
Comments