AI-Powered Spring Boot Concurrency: Virtual Threads in Practice

AI does not handle threads; it handles decisions. This article looks into how AI can help set safe concurrency limits for Spring Boot virtual threads.

Lavi Kumar

Feb. 03, 26 · Analysis

Likes (1)

Comment

Save

2.2K Views

Modern microservices face a common challenge: managing multiple tasks simultaneously without putting too much pressure on the systems that follow. Adjusting traditional thread pools often involves a lot of guesswork, which usually doesn't hold up in real-world situations. However, with the arrival of virtual threads in Java 21 and the growth of AI-powered engineering tools, we can create smart concurrency adapters that scale in a safe and intelligent way.

This article provides a step-by-step guide to a practical proof-of-concept using Spring Boot that employs AI (OpenAI/Gemini) to assist in runtime concurrency decisions. It also integrates virtual threads and bulkheads to ensure a good balance between throughput and the safety of downstream systems.

Why Concurrency Decisions Need Intelligence, Not Just Thread Pools

Spring Boot microservices often execute parallel fan-out, which means they make several downstream calls for each incoming HTTP request. In the past, developers adjusted:

Thread pools
Executor settings
Bulkheads and timeouts

based on their gut feelings. This method can be weak when there are changes in traffic, latency, or variability in downstream services.

Even with virtual threads that remove strict limits on thread counts, services still need protections to avoid:

Overloaded databases
Thread scheduling conflicts
Retry storms
Poor tail latency

This is where AI can assist by offering contextual suggestions instead of fixed configurations.

Solution Summary

Our proof of concept includes three key elements:

Spring Boot with enabled virtual threads. This utilizes Java 21’s lightweight thread features to prevent blocking I/O from overwhelming the server.
AI-driven concurrency advisor. This is a modular component that interacts with the following to suggest a maximum concurrency limit (maximum concurrent requests):
- OpenAI-compatible endpoints
- OR Google’s Gemini
Bulkhead pattern implemented with semaphores. This guarantees that only the recommended number of tasks operate at the same time.

The objective: allow AI to assist in identifying the concurrency level that a specific workload can handle.

Architecture

Here’s how the request flows:

The client makes a call to /api/aggregate?fanout=20&forceAi=true.
The controller sends the fan-out information to the AI Concurrency Advisor.
The advisor utilizes either the AI provider or a heuristic fallback.
It returns a JSON object containing maxConcurrency.
A semaphore bulkhead is established.
Tasks are processed on virtual threads.
Responses are gathered and sent back.
The advisor does not run threads — it merely suggests limits.

Implementation Details

Enabling Virtual Threads

The application.yml configuration in Spring Boot enables virtual threads:

    Java
   
   spring:
 threads:
   virtual:
     enabled: true

This guarantees that the framework processes request handling and asynchronous tasks using virtual threads by default.

AI Concurrency Advisor

We establish an AiConcurrencyAdvisor interface. Specific implementations consist of:

OpenAI client
Gemini client
Heuristic fallback

Sample JSON prompt utilized in the OpenAI client:

    JSON
   
 

   {
 "model":"gpt-4.1-mini",
 "temperature":0.1,
 "messages":[
   {"role":"system","content":"You are a senior JVM performance engineer…"},
   {"role":"user","content":"Operation: aggregate\nFanoutRequested: 50…"}
 ]
}

  

The service analyzes the JSON provided by the model and retrieves a secure maxConcurrency value.

Bulkhead With Semaphore

After a recommendation is received:

    Java
   
Semaphore semaphore = new Semaphore(maxConcurrency);

Before executing, each downstream task obtains a permit. This guarantees that only the recommended number of tasks operate at the same time — even with an unlimited number of virtual threads.

Key Code Snippets

AI Advisor Interface

This abstraction makes AI optional, interchangeable, and secure.

    Java
   
   public interface AiConcurrencyAdvisor {
 AdvisorDecision recommend(AdvisorInput input, boolean forceAi);
}

Separates AI logic from business logic
Enables switching between Gemini, OpenAI, or a heuristic fallback
Maintains testable and auditable concurrency decisions

Advisor Input Model

The quality of AI decisions depends on the context you give.

    Java
   
 

   public record AdvisorInput(
   String operation,
   int fanoutRequested,
   long expectedDownstreamLatencyMs,
   int cpuCores,
   Map<String, Object> hints
) {}

  

Rather than estimating concurrency limits, we offer:

Fan-out size
Latency expectations
CPU capacity
Workload hints

This reflects the thought process of a senior engineer regarding concurrency.

AI Decision Sanitization

Even AI recommendations must be constrained.

    Java
   
   int maxConcurrency = Math.max(1, Math.min(decision.maxConcurrency(), fanout));

Stops uncontrolled concurrency
Safeguards downstream systems
Guarantees AI output adheres to system rules
AI provides advice — the system makes the decision.

Service fan-out logic:

    Java
   
 

   try (ExecutorService vtExecutor = Executors.newVirtualThreadPerTaskExecutor()) {
 List<CompletableFuture<DownstreamResponse>> futures = new ArrayList<>(fanout);
 AtomicInteger idx = new AtomicInteger(0);


 for (int i = 0; i < fanout; i++) {
   futures.add(CompletableFuture.supplyAsync(() -> {
     boolean acquired = false;
     try {
       semaphore.acquire();
       acquired = true;


       int n = idx.incrementAndGet();
       return downstream.call("ds-" + n, id, latencyMs);
     } catch (InterruptedException e) {
       Thread.currentThread().interrupt();
       return new DownstreamResponse("interrupted", "INTERRUPTED", 0);
     } finally {
       if (acquired) semaphore.release();
     }
   }, vtExecutor).orTimeout(3, TimeUnit.SECONDS));
 }

  

This approach combines virtual threads with a bulkhead, allowing for safe scaling of blocking calls.

Starting the Project: Run the Project

Set optional environment variables for the AI provider.

Execute:

    Java
   
   ./gradlew bootRun

Test the endpoints:

    C
   
   curl "http://localhost:8080/api/aggregate?id=123&fanout=20"

Include &forceAi=true to enforce AI usage even if no key is set.

When to Utilize AI-Driven Concurrency

This approach is particularly beneficial when:

There is variability downstream
Latency patterns are uncertain
Manual adjustments are expensive
You need clear backpressure choices

AI suggestions must always be limited and checked with heuristics to guarantee safety in case LLM responses are surprising.

Conclusion

This proof of concept shows how AI (Gemini/OpenAI) can help with Spring Boot concurrency design. It does not replace human judgment but provides contextual recommendations based on workload characteristics. When paired with Java 21 virtual threads, this method allows for scalable, safe, and observable microservices.

AI Spring Boot

Opinions expressed by DZone contributors are their own.

Related

Trending