DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Zero-Cost AI with Java
  • Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery
  • Long-Running Durable Agents With Spring AI and Dapr Workflows

Trending

  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • A Hands-On ABAP RESTful Programming Model Guide
  • How to Write for DZone Publications: Trend Reports and Refcards
  • Master-Class: Understanding Database Replication (Single, Multi, and Leaderless)
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. AI-Powered Spring Boot Concurrency: Virtual Threads in Practice

AI-Powered Spring Boot Concurrency: Virtual Threads in Practice

AI does not handle threads; it handles decisions. This article looks into how AI can help set safe concurrency limits for Spring Boot virtual threads.

By 
Lavi Kumar user avatar
Lavi Kumar
·
Feb. 03, 26 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.0K Views

Join the DZone community and get the full member experience.

Join For Free

Modern microservices face a common challenge: managing multiple tasks simultaneously without putting too much pressure on the systems that follow. Adjusting traditional thread pools often involves a lot of guesswork, which usually doesn't hold up in real-world situations. However, with the arrival of virtual threads in Java 21 and the growth of AI-powered engineering tools, we can create smart concurrency adapters that scale in a safe and intelligent way.

This article provides a step-by-step guide to a practical proof-of-concept using Spring Boot that employs AI (OpenAI/Gemini) to assist in runtime concurrency decisions. It also integrates virtual threads and bulkheads to ensure a good balance between throughput and the safety of downstream systems.

Why Concurrency Decisions Need Intelligence, Not Just Thread Pools

Spring Boot microservices often execute parallel fan-out, which means they make several downstream calls for each incoming HTTP request. In the past, developers adjusted:

  • Thread pools
  • Executor settings
  • Bulkheads and timeouts

based on their gut feelings. This method can be weak when there are changes in traffic, latency, or variability in downstream services.

Even with virtual threads that remove strict limits on thread counts, services still need protections to avoid:

  • Overloaded databases
  • Thread scheduling conflicts
  • Retry storms
  • Poor tail latency

This is where AI can assist by offering contextual suggestions instead of fixed configurations.

Solution Summary

Our proof of concept includes three key elements:

  1. Spring Boot with enabled virtual threads. This utilizes Java 21’s lightweight thread features to prevent blocking I/O from overwhelming the server.
  2. AI-driven concurrency advisor. This is a modular component that interacts with the following to suggest a maximum concurrency limit (maximum concurrent requests):
    • OpenAI-compatible endpoints
    • OR Google’s Gemini
  3. Bulkhead pattern implemented with semaphores. This guarantees that only the recommended number of tasks operate at the same time.

The objective: allow AI to assist in identifying the concurrency level that a specific workload can handle.

Architecture

Here’s how the request flows:

  • The client makes a call to /api/aggregate?fanout=20&forceAi=true.
  • The controller sends the fan-out information to the AI Concurrency Advisor.
  • The advisor utilizes either the AI provider or a heuristic fallback.
  • It returns a JSON object containing maxConcurrency.
  • A semaphore bulkhead is established.
  • Tasks are processed on virtual threads.
  • Responses are gathered and sent back.
  • The advisor does not run threads — it merely suggests limits.

Implementation Details

Enabling Virtual Threads

The application.yml configuration in Spring Boot enables virtual threads:

Java
 
spring:
 threads:
   virtual:
     enabled: true


This guarantees that the framework processes request handling and asynchronous tasks using virtual threads by default.

AI Concurrency Advisor

We establish an AiConcurrencyAdvisor interface. Specific implementations consist of:

  • OpenAI client
  • Gemini client
  • Heuristic fallback

Sample JSON prompt utilized in the OpenAI client:

JSON
 
{
 "model":"gpt-4.1-mini",
 "temperature":0.1,
 "messages":[
   {"role":"system","content":"You are a senior JVM performance engineer…"},
   {"role":"user","content":"Operation: aggregate\nFanoutRequested: 50…"}
 ]
}


The service analyzes the JSON provided by the model and retrieves a secure maxConcurrency value.

Bulkhead With Semaphore

After a recommendation is received:

Java
 

Semaphore semaphore = new Semaphore(maxConcurrency);


Before executing, each downstream task obtains a permit. This guarantees that only the recommended number of tasks operate at the same time — even with an unlimited number of virtual threads.

Key Code Snippets

AI Advisor Interface

This abstraction makes AI optional, interchangeable, and secure.

Java
 
public interface AiConcurrencyAdvisor {
 AdvisorDecision recommend(AdvisorInput input, boolean forceAi);
}


  • Separates AI logic from business logic
  • Enables switching between Gemini, OpenAI, or a heuristic fallback
  • Maintains testable and auditable concurrency decisions

Advisor Input Model

The quality of AI decisions depends on the context you give.

Java
 
public record AdvisorInput(
   String operation,
   int fanoutRequested,
   long expectedDownstreamLatencyMs,
   int cpuCores,
   Map<String, Object> hints
) {}


Rather than estimating concurrency limits, we offer:

  • Fan-out size
  • Latency expectations
  • CPU capacity
  • Workload hints

This reflects the thought process of a senior engineer regarding concurrency.

AI Decision Sanitization

Even AI recommendations must be constrained.

Java
 
int maxConcurrency = Math.max(1, Math.min(decision.maxConcurrency(), fanout));


  • Stops uncontrolled concurrency
  • Safeguards downstream systems
  • Guarantees AI output adheres to system rules
  • AI provides advice — the system makes the decision.

Service fan-out logic:

Java
 
try (ExecutorService vtExecutor = Executors.newVirtualThreadPerTaskExecutor()) {
 List<CompletableFuture<DownstreamResponse>> futures = new ArrayList<>(fanout);
 AtomicInteger idx = new AtomicInteger(0);


 for (int i = 0; i < fanout; i++) {
   futures.add(CompletableFuture.supplyAsync(() -> {
     boolean acquired = false;
     try {
       semaphore.acquire();
       acquired = true;


       int n = idx.incrementAndGet();
       return downstream.call("ds-" + n, id, latencyMs);
     } catch (InterruptedException e) {
       Thread.currentThread().interrupt();
       return new DownstreamResponse("interrupted", "INTERRUPTED", 0);
     } finally {
       if (acquired) semaphore.release();
     }
   }, vtExecutor).orTimeout(3, TimeUnit.SECONDS));
 }


This approach combines virtual threads with a bulkhead, allowing for safe scaling of blocking calls.

Starting the Project: Run the Project 

Set optional environment variables for the AI provider.

Execute:

Java
 
./gradlew bootRun


Test the endpoints:

C
 
curl "http://localhost:8080/api/aggregate?id=123&fanout=20"

Include &forceAi=true to enforce AI usage even if no key is set.


When to Utilize AI-Driven Concurrency

This approach is particularly beneficial when:

  • There is variability downstream
  • Latency patterns are uncertain
  • Manual adjustments are expensive
  • You need clear backpressure choices

AI suggestions must always be limited and checked with heuristics to guarantee safety in case LLM responses are surprising.

Conclusion

This proof of concept shows how AI (Gemini/OpenAI) can help with Spring Boot concurrency design. It does not replace human judgment but provides contextual recommendations based on workload characteristics. When paired with Java 21 virtual threads, this method allows for scalable, safe, and observable microservices.

AI Spring Boot

Opinions expressed by DZone contributors are their own.

Related

  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Zero-Cost AI with Java
  • Building a Retrieval-Augmented Generation (RAG) System in Java With Spring AI, Vertex AI, and BigQuery
  • Long-Running Durable Agents With Spring AI and Dapr Workflows

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook