AI-Powered Spring Boot Concurrency: Virtual Threads in Practice
AI does not handle threads; it handles decisions. This article looks into how AI can help set safe concurrency limits for Spring Boot virtual threads.
Join the DZone community and get the full member experience.
Join For FreeModern microservices face a common challenge: managing multiple tasks simultaneously without putting too much pressure on the systems that follow. Adjusting traditional thread pools often involves a lot of guesswork, which usually doesn't hold up in real-world situations. However, with the arrival of virtual threads in Java 21 and the growth of AI-powered engineering tools, we can create smart concurrency adapters that scale in a safe and intelligent way.
This article provides a step-by-step guide to a practical proof-of-concept using Spring Boot that employs AI (OpenAI/Gemini) to assist in runtime concurrency decisions. It also integrates virtual threads and bulkheads to ensure a good balance between throughput and the safety of downstream systems.
Why Concurrency Decisions Need Intelligence, Not Just Thread Pools
Spring Boot microservices often execute parallel fan-out, which means they make several downstream calls for each incoming HTTP request. In the past, developers adjusted:
- Thread pools
- Executor settings
- Bulkheads and timeouts
based on their gut feelings. This method can be weak when there are changes in traffic, latency, or variability in downstream services.
Even with virtual threads that remove strict limits on thread counts, services still need protections to avoid:
- Overloaded databases
- Thread scheduling conflicts
- Retry storms
- Poor tail latency
This is where AI can assist by offering contextual suggestions instead of fixed configurations.
Solution Summary
Our proof of concept includes three key elements:
- Spring Boot with enabled virtual threads. This utilizes Java 21’s lightweight thread features to prevent blocking I/O from overwhelming the server.
- AI-driven concurrency advisor. This is a modular component that interacts with the following to suggest a maximum concurrency limit (maximum concurrent requests):
- OpenAI-compatible endpoints
- OR Google’s Gemini
- Bulkhead pattern implemented with semaphores. This guarantees that only the recommended number of tasks operate at the same time.
The objective: allow AI to assist in identifying the concurrency level that a specific workload can handle.
Architecture
Here’s how the request flows:
- The client makes a call to
/api/aggregate?fanout=20&forceAi=true. - The controller sends the fan-out information to the AI Concurrency Advisor.
- The advisor utilizes either the AI provider or a heuristic fallback.
- It returns a JSON object containing maxConcurrency.
- A semaphore bulkhead is established.
- Tasks are processed on virtual threads.
- Responses are gathered and sent back.
- The advisor does not run threads — it merely suggests limits.
Implementation Details
Enabling Virtual Threads
The application.yml configuration in Spring Boot enables virtual threads:
spring:
threads:
virtual:
enabled: true
This guarantees that the framework processes request handling and asynchronous tasks using virtual threads by default.
AI Concurrency Advisor
We establish an AiConcurrencyAdvisor interface. Specific implementations consist of:
- OpenAI client
- Gemini client
- Heuristic fallback
Sample JSON prompt utilized in the OpenAI client:
{
"model":"gpt-4.1-mini",
"temperature":0.1,
"messages":[
{"role":"system","content":"You are a senior JVM performance engineer…"},
{"role":"user","content":"Operation: aggregate\nFanoutRequested: 50…"}
]
}
The service analyzes the JSON provided by the model and retrieves a secure maxConcurrency value.
Bulkhead With Semaphore
After a recommendation is received:
Semaphore semaphore = new Semaphore(maxConcurrency);
Before executing, each downstream task obtains a permit. This guarantees that only the recommended number of tasks operate at the same time — even with an unlimited number of virtual threads.
Key Code Snippets
AI Advisor Interface
This abstraction makes AI optional, interchangeable, and secure.
public interface AiConcurrencyAdvisor {
AdvisorDecision recommend(AdvisorInput input, boolean forceAi);
}
- Separates AI logic from business logic
- Enables switching between Gemini, OpenAI, or a heuristic fallback
- Maintains testable and auditable concurrency decisions
Advisor Input Model
The quality of AI decisions depends on the context you give.
public record AdvisorInput(
String operation,
int fanoutRequested,
long expectedDownstreamLatencyMs,
int cpuCores,
Map<String, Object> hints
) {}
Rather than estimating concurrency limits, we offer:
- Fan-out size
- Latency expectations
- CPU capacity
- Workload hints
This reflects the thought process of a senior engineer regarding concurrency.
AI Decision Sanitization
Even AI recommendations must be constrained.
int maxConcurrency = Math.max(1, Math.min(decision.maxConcurrency(), fanout));
- Stops uncontrolled concurrency
- Safeguards downstream systems
- Guarantees AI output adheres to system rules
- AI provides advice — the system makes the decision.
Service fan-out logic:
try (ExecutorService vtExecutor = Executors.newVirtualThreadPerTaskExecutor()) {
List<CompletableFuture<DownstreamResponse>> futures = new ArrayList<>(fanout);
AtomicInteger idx = new AtomicInteger(0);
for (int i = 0; i < fanout; i++) {
futures.add(CompletableFuture.supplyAsync(() -> {
boolean acquired = false;
try {
semaphore.acquire();
acquired = true;
int n = idx.incrementAndGet();
return downstream.call("ds-" + n, id, latencyMs);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return new DownstreamResponse("interrupted", "INTERRUPTED", 0);
} finally {
if (acquired) semaphore.release();
}
}, vtExecutor).orTimeout(3, TimeUnit.SECONDS));
}
This approach combines virtual threads with a bulkhead, allowing for safe scaling of blocking calls.
Starting the Project: Run the Project
Set optional environment variables for the AI provider.
Execute:
./gradlew bootRun
Test the endpoints:
curl "http://localhost:8080/api/aggregate?id=123&fanout=20"
Include &forceAi=true to enforce AI usage even if no key is set.
When to Utilize AI-Driven Concurrency
This approach is particularly beneficial when:
- There is variability downstream
- Latency patterns are uncertain
- Manual adjustments are expensive
- You need clear backpressure choices
AI suggestions must always be limited and checked with heuristics to guarantee safety in case LLM responses are surprising.
Conclusion
This proof of concept shows how AI (Gemini/OpenAI) can help with Spring Boot concurrency design. It does not replace human judgment but provides contextual recommendations based on workload characteristics. When paired with Java 21 virtual threads, this method allows for scalable, safe, and observable microservices.
Opinions expressed by DZone contributors are their own.
Comments