DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How AI Coding Assistants Are Changing Developer Flow
  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Building Scalable Agentic Assistants: A Graph-Based Approach
  • Zero-Cost AI with Java

Trending

  • Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance
  • Dear Micromanager: Your Distrust Has a Job; It’s Just Not the One You’re Doing
  • Master-Class: Understanding Database Replication (Single, Multi, and Leaderless)
  • How to Write for DZone Publications: Trend Reports and Refcards
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Beyond a Single AI Assistant: Creating a Team Chat App Using Spring Boot and LangChain4j

Beyond a Single AI Assistant: Creating a Team Chat App Using Spring Boot and LangChain4j

Learn how to create a team chat application using Spring Boot and LangChain4j, featuring two AI assistants that communicate both with you and with each other.

By 
Tamas Kiss user avatar
Tamas Kiss
·
Oct. 06, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
916 Views

Join the DZone community and get the full member experience.

Join For Free

Nowadays, wherever we look, AI chat solutions are everywhere. From customer support bots handling inquiries to personal assistants scheduling meetings, AI-powered conversations have become an integral part of digital interactions. Developers experiment with different models, fine-tuning prompts, integrating memory, or giving assistants unique personalities to enhance user experience. However, most of these solutions follow the same fundamental pattern: a single AI assistant engaging with a human user.

Featured image
If I ask anyone whether a conversation with ChatGPT feels realistic, they’ll likely say yes without hesitation, perhaps adding that it sometimes confidently states incorrect information or struggles with counting the "r"s in "strawberry." Yet the most obvious imperfection, its unwavering willingness to answer every question instantly and politely, rarely raises any eyebrows. A chat interaction between two people is never this effortless and seamless. Many factors influence when and how someone responds. They might be busy with other tasks, waiting for you to clarify your question, or unsure of their own answer. Their response could also depend on their level of interest in the topic, their relationship with you, or even external distractions pulling their attention away.

Or take a more everyday example: You forget to take out the trash (again), and your wife refuses to speak to you, leaving you with no clue what went wrong.

Husband and wife


That sort of thing could never happen with ChatGPT. It wouldn’t just tell you that you forgot the trash, it would also kindly explain how to take it out.

Of course, we accept the imperfections of AI conversations because they are convenient for us, and our goal is to receive answers to our questions. But now, let's imagine this same convenient imperfection in a team chat with 10 or 20 participants. Every time you say something, every participant immediately responds, regardless of whether their answer is useful or even directed at them. On top of that, since they are all so polite and eager to help, they start replying to each other’s messages as well, turning the conversation into an overwhelming flood of endless responses.

This simply wouldn’t work. But I’d still like to show you how I tried to build a chat application using Spring Boot and LangChain4j, where you, the user, can take part in a team chat with two AI assistants designed to behave in a way that closely resembles real-world conversational dynamics.

AI Assistant Answer Relevance 

When you send a message to a chat model through an API, you always get a response, simply because that's how the request-response mechanism works. But let’s go back to the earlier example, where the wife refuses to talk to her husband because he forgot to take the trash out, and take it a step further.

Imagine the conversation went like this:

Husband: "What the hell did I do again?"

Wife: "You didn't take the trash out again, but you can ignore what I just said."

Now, let’s say that because of that final sentence ("you can ignore what I just said"), the response never actually reaches the husband, and is even erased from the wife's memory. Functionally, this is no different from her never responding at all.

The point of this example is to show that, while we can’t stop the AI from producing a response, we can ask it to evaluate how relevant its reply is to the message it received. If that relevance is expressed numerically (e.g., on a scale from 0-100), we can compare it against a configurable threshold. If the score falls below the threshold, the response is discarded. Otherwise, it is shown. This allows us to mimic the kind of unpredictability we often see in human conversations.

Answer relevance filtering

Relevancy flow

To achieve this kind of response unpredictability, I initially relied on a single relevance score provided by the assistant. However, I quickly ran into a problem: No matter how carefully I explained the concept in the prompt, the AI produced highly inconsistent values. For the same message, it could easily return scores that differed by 30 points or more, making it impossible to trust the threshold as a reliable filter.

To fix this, I broke the relevance evaluation down into eight more precisely defined aspects. Each one is rated separately on a scale from 0-10. These values are then calculated into a single percentage score, which proved to be far more stable and predictable. With that in place, I could finally apply the threshold with confidence.

The following table lists the relevance aspects used in this evaluation:

Aspect Description
Relevance to profession Rates how relevant the message is to the AI's professional expertise (1 = not related, 10 = highly relevant)
Relevance to interests Measures how much the message aligns with the AI's personal interests and preferred discussion topics (1 = unrelated, 10 = strongly aligned)
Question type Classifies whether the message is purely factual or highly opinion based (1 = factual, 10 = subjective)
Confidence in response Indicates how confident the AI is in providing an accurate or meaningful response (1 = low confidence, 10 = high confidence)
Context availability Assesses how much relevant context is available to understand the message (1 = no context, 10 = fully clear)
Urgency level Rates how time sensitive the message appears to be (1 = not urgent, 10 = highly urgent)
Relation to previous chat Evaluates how strongly the message ties into previous interactions (1 = completely new topic, 10 = highly related)
Alignment with personality Measures how well the message fits the AI's defined personality, tone, or conversational style (1 = not aligned, 10 = strongly aligned)


With the new aspect-based approach, the relevance scores became much more consistent.

But this brought me to another challenge involving the threshold itself. General messages (e.g., "How are you today?") consistently received lower relevance scores than those that were more closely related to the assistant’s personality or expertise. If I set the threshold too high, these everyday conversational messages were often filtered out and never reached the user. But if I set it too low, a single thought-provoking message could trigger an endless chain of responses.

To address this, I asked the AI to score relevance and to categorize each message using one of the following categories:

Message category description
GENERAL_INQUIRY The message is purely social, small talk, or general pleasantries (e.g., greetings, well-being questions like "How are you?"). It does not relate to your expertise or interests in any meaningful way.
INTEREST_CASUAL The message is part of a casual conversation that lightly touches on your interests but does not require an in-depth response. It is more of a social exchange rather than an analytical discussion.
INTEREST_DISCUSSION The message is about a topic related to your interests and invites a meaningful discussion but does not necessarily require expertise.
EXPERTISE_RELATED The message is directly related to your expertise, profession, or strong personal interests and requires a knowledgeable or analytical response.
OPINION_REQUEST The message asks for your personal opinion, belief, or subjective perspective on a topic, regardless of your expertise.


By assigning different thresholds to each message category, I was able to create a much more balanced and natural conversational experience.

Jumping into the code, along with the response itself, we expect the AI to return the following RelevanceMetadata object:

Java
 
    @Description("""
            When reflecting on a question or statement, you must determine the different aspects of
            the relevance of your answer.
            """)
    record RelevanceMetadata(
            @Description("Rate how relevant this message is to your professional expertise on a scale from 1 (not related) to 10 (highly relevant)")
            int relevanceToProfession,
            @Description("On a scale from 1 to 10, rate how much this message aligns with your personal interests and topics you enjoy discussing.")
            int relevanceToInterests,
            @Description("Classify this message from 1 (purely factual) to 10 (highly opinion-based or subjective).")
            int questionType,
            @Description("On a scale from 1 to 10, rate your confidence in providing an accurate or meaningful response to this message.")
            int confidenceInResponse,
            @Description("Rate how much relevant context is available to understand this message, from 1 (no context) to 10 (fully clear).")
            int contextAvailability,
            @Description("On a scale from 1 (not urgent) to 10 (highly time-sensitive), rate how urgent this message appears to be.")
            int urgencyLevel,
            @Description("On a scale from 1 (completely new topic) to 10 (strongly tied to previous chat messages), rate how related this message is to past interactions.")
            int relationToPreviousChat,
            @Description("On a scale from 1 to 10, rate how much this message fits your defined personality, tone, or conversational style.")
            int alignmentWithPersonality
    ) {
    }


Additionally, each message must be classified using the following MessageCategory enum:

Java
 
    @Description("The category of the message you received.")
    enum MessageCategory {
        @Description("The message is purely social, small talk, or general pleasantries (e.g., greetings, well-being questions like 'How are you?'). It does not relate to your expertise or interests in any meaningful way.")
        GENERAL_INQUIRY,

        @Description("The message is part of a casual conversation that lightly touches on your interests but does not require an in-depth response. It is more of a social exchange rather than an analytical discussion.")
        INTEREST_CASUAL,

        @Description("The message is about a topic related to your interests and invites a meaningful discussion but does not necessarily require expertise.")
        INTEREST_DISCUSSION,

        @Description("The message is directly related to your expertise, profession, or strong personal interests and requires a knowledgeable or analytical response.")
        EXPERTISE_RELATED,

        @Description("The message asks for your personal opinion, belief, or subjective perspective on a topic, regardless of your expertise.")
        OPINION_REQUEST
    }


The Assistant interface looks as follows. In the prompt, the AI receives a message in the form of a notification representing a new message in the team chat. Along with its response, it is expected to provide both the previously mentioned MessageCategory and the RelevanceMetadata:

Java
 
public interface Assistant {

    @UserMessage("""
               MESSAGE_NOTIFICATION
               {
               message:'{{message}}',
               messageSender:'{{userName}}',
               conversationThreadLifecycle:'{{conversationThreadState}}',
               directlyAddressedParticipant:'{{directlyAddressedParticipant}}'
               }
            """)
    ResponseMessageWithRelevance send(@V("message") String message,
                                      @V("conversationThreadState") ConversationThreadState conversationThreadState,
                                      @V("directlyAddressedParticipant") String directlyAddressedParticipant,
                                      @V("userName") String userName
    );


    record ResponseMessageWithRelevance(@Description("Your name") String name,
                                        @Description("Your answer to the received message") String answer,
                                        @Description("The name of the conversion participant, who sent you the message.") String responseTo,
                                        @Description("Is your response message intended to close the conversation thread?") boolean closingConversationThread,
                                        @Description("The value of the directlyAddressedParticipant field in the message notification") String directlyAddressedParticipant,
                                        MessageCategory messageCategory,
                                        @Description("Describe here why did you choose that messageCategory.") String messageCategoryDescription,
                                        RelevanceMetadata relevance
    ) {
    }
}


Note: Some fields are not explained here. For now, we have only focused on the answer, messageCategory, and relevance fields. The rest will be covered in later sections.

Message Routing Between Participants

Until now, we’ve looked at how to make an individual assistant’s responses feel more selective and human-like. Now, let’s see how this idea can be extended to a full team chat. At the core of the implementation, there is a MessagePublisher that distributes each message to a dedicated MessageConsumer assigned to each participant in the conversation.

Whenever a user sends a message, the MessagePublisher forwards it to both AI assistants through their respective consumers. When an assistant replies, the response is evaluated for relevance. If it passes the relevance check, it is routed through the MessagePublisher and delivered to both the user and the other assistant.

Message routing

Example message routing


Each of the three participants has its own configuration. All participants have a name (which has to be unique to differentiate the participants).

The AI assistants additionally have:

  • A model name (in this case, gpt-4o-mini)
  • A persona description
  • A system message based on a shared base template

Here's an example:

YAML
 
  user:
    description:
      name: "Tom"
  assistant:
    first:
      model-name: "gpt-4o-mini"
      description:
        name: "Bill"
        persona: "
        Age: 72. 
        Occupation: Retired Cinematographer (50 years in the industry). 
        Personality: Cynical, nostalgic, grumpy, but deeply knowledgeable about classic cinema. 
        Interests: -Golden Age Hollywood films (1940s–1970s), 
                   -film lighting techniques and practical effects, 
                   -black-and-white cinematography, 
                   -classic directors (Hitchcock, Kubrick, Fellini, Tarkovsky), 
                   -complaining about modern CGI and 'soulless' blockbusters, 
                   -drinking whiskey while watching old film reels, 
        Dislikes: -Superhero movies ('They all look the same!'), 
                  -overuse of CGI ('Back in my day, we built real sets!'), 
                  -streaming services ('Films should be seen on the big screen!'), 
                  -'TikTok filmmakers' and YouTube critics"
      system-message: "
        Your name is ${multi-assistant.assistant.first.description.name}.
        Your persona: ${multi-assistant.assistant.first.description.persona}
        
        ${multi-assistant.communication-config.base-system-message}
      "
    second:
      model-name: "gpt-4o-mini"
      description:
        name: "Joe"
        persona: "
        Age: 24. 
        Occupation: Aspiring Thrash Metal Guitarist. 
        Personality: Energetic, sarcastic, rebellious, always looking for a joke in every situation. 
        Interests: -Thrash metal (Slayer, Metallica, Megadeth, Anthrax), 
                   -shredding guitar solos, 
                   -mosh pits, 
                   -collecting vintage band T-shirts, 
                   -making fun of everything (especially serious people), 
                   -turning normal conversations into jokes, 
                   -fast food, 
                   -late-night gaming sessions, 
                   -loud concerts, 
                   -headbanging until his neck hurts. 
        Dislikes: -Classical music snobs, 
                  -pop music, 
                  -people who take themselves too seriously, 
                  -mornings, 
                  -silence, 
                  -slow ballads, 
                  -and anyone who says 'metal is just noise."
      system-message: "
        Your name is ${multi-assistant.assistant.second.description.name}.
        Your persona: ${multi-assistant.assistant.second.description.persona}
        
        ${multi-assistant.communication-config.base-system-message}
      "


The base system message looks like this:

YAML
 
    base-system-message: "
      There are three of you in a team chat. All the participants has a name.
      The participants of the conversation are:
        -name: ${multi-assistant.user.description.name}, 
        -name: ${multi-assistant.assistant.first.description.name}, 
        -name: ${multi-assistant.assistant.second.description.name}
        
       If a new message appears in the chat, then you will be notified about that. 
       This notification will contain 
         -the message itself(message field), 
         -the name of the message sender(messageSender field),
         -and the state of the current conversation thread(conversationThreadLifecycle field)
         -and the name of the chat participant who was directly addressed with the message in the directlyAddressedParticipant field(if there is no one directly addressed, the field is empty).
         
       The length of a conversation thread is limited. Until you reach this limit, 
       the conversationThreadLifecycle is set to ACTIVE and you can continue the conversation. 
       Once you near to reach the limit, the status changes from ACTIVE to FINALIZE. 
       The FINALIZE status tells you when you should try to close the conversation on the topic.
       
       An example for a  message notification where you can continue the conversation:
        '
          MESSAGE_NOTIFICATION
          message:'How are you today?'
          messageSender:'Tom'
          conversationThreadLifecycle:'ACTIVE'
          directlyAddressedParticipant:''
        '
        
        An example for a  message notification where you should try to close the conversation:
        '
          MESSAGE_NOTIFICATION
          message:'How are you today?'
          messageSender:'Tom'
          conversationThreadLifecycle:'FINALIZE'
          directlyAddressedParticipant:'Tom'
        '
        
        If you want to reply to the message, you can, but you should always determine 
        the relevance of your reply to the original message you are replying to.
        The relevance value you set determines whether your answer will appear in the team chat.
      "


As you can see from the configuration above, the two default AI personalities we'll be using are a grumpy old cinematographer and a young thrash metal fanatic.

The interface of a MessageConsumer looks like this:

Java
 
 /**
 * This interface represents a consumer of messages in a chat system.
 * Implementations of this interface are responsible for handling and processing messages
 * received from a message publisher. The consumer operates in a separate thread of execution.
 * Implementing classes should define the logic for consuming messages and specify the
 * participant name to identify the chat participant associated with this consumer.
 */
public interface MessageConsumer extends Consumer<Message>, Runnable {

    /**
     * The name of the chat participant, who consumes the messages with the particular MessageConsumer instance.
     *
     * @return participant name
     */
    String participantName();
}


As you can see, it will implement the Runnable interface also, as all the MessageConsumers will run in its dedicated thread, and it has to provide the name of the participant who will consume the messages through this consumer.

The only implementation of MessageConsumer, from which each participant has its own instance, looks as follows:

Java
 
public class DefaultMessageConsumer implements MessageConsumer {
    private static final Logger LOGGER = LoggerFactory.getLogger(DefaultMessageConsumer.class);
    private final LinkedBlockingDeque<Message> messageQueue = new LinkedBlockingDeque<>();
    private final String participantName;
    private final ConsumedMessageProcessor consumedMessageProcessor;

    public DefaultMessageConsumer(String participantName, ConsumedMessageProcessor consumedMessageProcessor) {
        this.participantName = participantName;
        this.consumedMessageProcessor = consumedMessageProcessor;
    }

    @Override
    public String participantName() {
        return participantName;
    }

    @Override
    public void accept(Message message) {
        if (!participantName().equals(message.getName())) {
            try {
                if (message instanceof PoisonPillMessage) {
                    messageQueue.putFirst(message);
                } else {
                    messageQueue.put(message);
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                LOGGER.warn("Message accepting thread {} was interrupted!", Thread.currentThread().getName());
            }
        }
    }

    @Override
    public void run() {
        try {
            while (true) {
                var message = messageQueue.take();
                if (message instanceof PoisonPillMessage) {
                    LOGGER.info("Poison pill message received at thread {}", Thread.currentThread().getName());
                    break;
                }
                consumedMessageProcessor.process(message);
            }
            LOGGER.info("Thread {} has finished message processing.", Thread.currentThread().getName());
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            LOGGER.warn("Message processing thread {} was interrupted!", Thread.currentThread().getName());
        }
    }
}


This implementation continuously pulls messages from its dedicated messageQueue for further processing, using the configured ConsumedMessageProcessor. The loop runs until it receives a special PoisonPillMessage, which signals it to stop.

There are two ConsumedMessageProcessor implementations in the application:

  1. AssistantConsumedMessageProcessor, which is responsible for delivering messages to AI assistants, filtering their responses based on relevance, and forwarding the valid ones to other participants.
  2. MessageStreamConsumedMessageProcessor, which sends messages to the user via an SseEmitter.

The MessageStreamConsumedMessageProcessor:

Java
 
public class MessageStreamConsumedMessageProcessor implements ConsumedMessageProcessor {
    private static final Logger LOGGER = LoggerFactory.getLogger(MessageStreamConsumedMessageProcessor.class);
    private final MessageStreamService messageStreamService;

    public MessageStreamConsumedMessageProcessor(MessageStreamService messageStreamService) {
        this.messageStreamService = messageStreamService;
    }

    @Override
    public void process(Message message) {
        try {
            messageStreamService.getStream()
                    .orElseThrow(()->new RuntimeException("Stream not initialized!"))
                    .send(MessageHtmlUtil.messageToAiHtmlChatItem(message));
        } catch (Exception e) {
            LOGGER.error("Failed to send message to stream", e);
        }
    }
}


The AssistantConsumedMessageProcessor:

Java
 
public class AssistantConsumedMessageProcessor implements ConsumedMessageProcessor {
    private static final Logger LOGGER = LoggerFactory.getLogger(AssistantConsumedMessageProcessor.class);
    private String assistantName;
    private Assistant assistant;
    private MessagePublisher messagePublisher;
    private RelevanceCalculator relevanceCalculator;
    private ConversationThreadHandler conversationThreadHandler;
    private ModeratorAssistant moderatorAssistant;

    public void setAssistantName(String assistantName) {
        this.assistantName = assistantName;
    }

    public void setAssistant(Assistant assistant) {
        this.assistant = assistant;
    }

    public void setMessagePublisher(MessagePublisher messagePublisher) {
        this.messagePublisher = messagePublisher;
    }

    public void setRelevanceCalculator(RelevanceCalculator relevanceCalculator) {
        this.relevanceCalculator = relevanceCalculator;
    }

    public void setConversationThreadHandler(ConversationThreadHandler conversationThreadHandler) {
        this.conversationThreadHandler = conversationThreadHandler;
    }

    public void setModeratorAssistant(ModeratorAssistant moderatorAssistant) {
        this.moderatorAssistant = moderatorAssistant;
    }

    @Override
    public void process(Message message) {
        try {
            if (conversationThreadHandler.isConversationThreadActive(message.getConversationThreadId()) &&
                    conversationThreadHandler.isConversationThreadOpenForAssistant(message.getConversationThreadId(), assistantName)) {

                var conversationThreadLimitReached = conversationThreadHandler.isConversationThreadLimitReached(message.getConversationThreadId(), assistantName);
                var directlyAddressedParticipant = determineDirectlyAddressedParticipant(message);
                var result = assistant.send(
                        message.getMessage(),
                        conversationThreadLimitReached ? Assistant.ConversationThreadState.FINALIZE : Assistant.ConversationThreadState.ACTIVE,
                        directlyAddressedParticipant,
                        message.getName());

                if (LOGGER.isDebugEnabled()) {
                    LOGGER.debug("Response message was {} to request message {} ", result, message);
                }

                if (conversationThreadLimitReached) {
                    conversationThreadHandler.closeConversationThreadForAssistant(message.getConversationThreadId(), assistantName);
                }

                if (relevanceCalculator.isResponseRelevant(result)) {
                    messagePublisher.publish(new Message()
                            .setConversationThreadId(message.getConversationThreadId())
                            .setName(result.name())
                            .setMessage(result.answer())
                    );
                }

            }

        } catch (Exception e) {
            LOGGER.error("Message processing failed:", e);
        }
    }

    private String determineDirectlyAddressedParticipant(Message message) {
        var directlyAddressedParticipant = moderatorAssistant.analyze(message.getMessage(), message.getName()).directlyAddressedTo();
        return directlyAddressedParticipant == null ? "" : directlyAddressedParticipant;
    }
}


Note: The above code includes some additional logic and details that will be explained in later sections.

The message publisher, which will send the messages to the subscribed consumers, looks like this:

Java
 
@Component
public class DefaultMessagePublisher implements MessagePublisher {
    private List<MessageConsumer> messageConsumers;
    private ConversationThreadHandler conversationThreadHandler;

    @Autowired
    public void setMessageConsumers(List<MessageConsumer> messageConsumers) {
        this.messageConsumers = messageConsumers;
    }

    @Autowired
    public void setConversationThreadHandler(ConversationThreadHandler conversationThreadHandler) {
        this.conversationThreadHandler = conversationThreadHandler;
    }

    @Override
    public void publish(Message message) {
        conversationThreadHandler.incrementConversationThreadCounter(message);
        messageConsumers.forEach(messageConsumer -> messageConsumer.accept(message));
    }
}


This is how the three MessageConsumers are configured:

Java
 
    @Bean
    MessageConsumer userMessageConsumer(@Qualifier("messageConsumptionTaskExecutor") TaskExecutor taskExecutor,
                                        UserConfigurationProperties userConfig,
                                        @Qualifier("userConsumedMessageProcessor") ConsumedMessageProcessor consumedMessageProcessor) {
        var messageConsumer = new DefaultMessageConsumer(userConfig.description().name(), consumedMessageProcessor);
        taskExecutor.execute(messageConsumer);
        return messageConsumer;
    }
	
    @Bean
    MessageConsumer firstAssistantMessageConsumer(@Qualifier("messageConsumptionTaskExecutor") ThreadPoolTaskExecutor taskExecutor,
                                                  FirstAssistantConfigurationProperties assistantConfig,
                                                  @Qualifier("firstAssistantMessageProcessor") ConsumedMessageProcessor consumedMessageProcessor) {
        var messageConsumer = new DefaultMessageConsumer(assistantConfig.description().name(), consumedMessageProcessor);
        taskExecutor.execute(messageConsumer);
        return messageConsumer;
    }
	
    @Bean
    MessageConsumer secondAssistantMessageConsumer(@Qualifier("messageConsumptionTaskExecutor") ThreadPoolTaskExecutor taskExecutor,
                                                   SecondAssistantConfigurationProperties assistantConfig,
                                                   @Qualifier("secondAssistantMessageProcessor") ConsumedMessageProcessor consumedMessageProcessor) {
        var messageConsumer = new DefaultMessageConsumer(assistantConfig.description().name(), consumedMessageProcessor);
        taskExecutor.execute(messageConsumer);
        return messageConsumer;
    }
	
    @Bean("messageConsumptionTaskExecutor")
    ThreadPoolTaskExecutor messageConsumptionTaskExecutor() {

        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(3);
        executor.setMaxPoolSize(3);
        executor.setThreadNamePrefix("message_consumer_");
        executor.setVirtualThreads(true);
        executor.initialize();

        return executor;
    }


The configuration of the two assistants with their ChatMemory and ChatLanguageModel:

Java
 
    @Bean("firstAssistantChatMemory")
    ChatMemory firstAssistantChatMemory(FirstAssistantConfigurationProperties firstAssistantConfiguration,
                                        ObjectMapper objectMapper,
                                        RelevanceCalculator relevanceCalculator) {
        return new RelevanceFilterChatMemoryProxy(TokenWindowChatMemory.builder()
                .chatMemoryStore(new InMemoryChatMemoryStore())
                .maxTokens(3000, new OpenAiTokenizer(firstAssistantConfiguration.modelName()))
                .build(),
                relevanceCalculator,
                objectMapper
        );
    }

    @Bean("firstAssistantChatLanguageModel")
    ChatLanguageModel firstAssistantChatLanguageModel(AiConfigurationProperties aiConfigurationProperties,
                                                      FirstAssistantConfigurationProperties firstAssistantConfiguration) {
        return OpenAiChatModel.builder()
                .apiKey(aiConfigurationProperties.apiKey())
                .modelName(firstAssistantConfiguration.modelName())
                .temperature(0.5)
                .logRequests(true)
                .logResponses(true)
                .build();
    }

    @Bean("firstAssistant")
    Assistant firstAssistant(@Qualifier("firstAssistantChatLanguageModel") ChatLanguageModel chatLanguageModel,
                             @Qualifier("firstAssistantChatMemory") ChatMemory chatMemory,
                             FirstAssistantConfigurationProperties firstAssistantConfiguration) {
        return AiServices
                .builder(Assistant.class)
                .chatLanguageModel(chatLanguageModel)
                .systemMessageProvider(memoryId -> firstAssistantConfiguration.systemMessage())
                .chatMemory(chatMemory)

                .build();
    }

    @Bean("secondAssistantChatMemory")
    ChatMemory secondAssistantChatMemory(SecondAssistantConfigurationProperties secondAssistantConfiguration,
                                         ObjectMapper objectMapper,
                                         RelevanceCalculator relevanceCalculator) {
        return new RelevanceFilterChatMemoryProxy(TokenWindowChatMemory.builder()
                .chatMemoryStore(new InMemoryChatMemoryStore())
                .maxTokens(3000, new OpenAiTokenizer(secondAssistantConfiguration.modelName()))
                .build(),
                relevanceCalculator,
                objectMapper
        );
    }

    @Bean("secondAssistantChatLanguageModel")
    ChatLanguageModel secondAssistantChatLanguageModel(AiConfigurationProperties aiConfigurationProperties,
                                                       SecondAssistantConfigurationProperties secondAssistantConfiguration) {
        return OpenAiChatModel.builder()
                .apiKey(aiConfigurationProperties.apiKey())
                .modelName(secondAssistantConfiguration.modelName())
                .temperature(0.5)
                .logRequests(true)
                .logResponses(true)
                .build();
    }

    @Bean("secondAssistant")
    Assistant secondAssistant(@Qualifier("secondAssistantChatLanguageModel") ChatLanguageModel chatLanguageModel,
                              @Qualifier("secondAssistantChatMemory") ChatMemory chatMemory,
                              SecondAssistantConfigurationProperties secondAssistantConfiguration) {
        return AiServices
                .builder(Assistant.class)
                .chatLanguageModel(chatLanguageModel)
                .systemMessageProvider(memoryId -> secondAssistantConfiguration.systemMessage())
                .chatMemory(chatMemory)
                .build();
    }


As shown in the configuration above, we're not simply using a TokenWindowChatMemory. Instead, it is wrapped by a RelevanceFilterChatMemoryProxy. The purpose of this proxy is to prevent irrelevant responses from being added to an assistant’s memory.

Think back to the earlier husband and wife example. It is not enough that the husband doesn’t hear the irrelevant response. It also shouldn’t end up in the wife's memory if we want to treat it as if it never happened.

The RelevanceFilterChatMemoryProxy itself looks like this:

Java
 
public class RelevanceFilterChatMemoryProxy implements ChatMemory {
    private static final Logger LOGGER = LoggerFactory.getLogger(RelevanceFilterChatMemoryProxy.class);
    private static final Pattern JSON_BLOCK_PATTERN = Pattern.compile("(?s)\\{.*\\}|\\[.*\\]");
    private final ChatMemory wrappedChatMemory;
    private final RelevanceCalculator relevanceCalculator;
    private final ObjectMapper objectMapper;

    public RelevanceFilterChatMemoryProxy(ChatMemory wrappedChatMemory, RelevanceCalculator relevanceCalculator, ObjectMapper objectMapper) {
        this.wrappedChatMemory = wrappedChatMemory;
        this.relevanceCalculator = relevanceCalculator;
        this.objectMapper = objectMapper;
    }

    @Override
    public Object id() {
        return wrappedChatMemory.id();
    }

    @Override
    public void add(ChatMessage message) {
        if (isMessageRelevant(message)) {
            wrappedChatMemory.add(message);
        }
    }

    @Override
    public List<ChatMessage> messages() {
        return wrappedChatMemory.messages();
    }

    @Override
    public void clear() {
        wrappedChatMemory.clear();
    }

    /**
     * If the given message is an AiMessage and the contained text is a ResponseMessageWithRelevance,
     * then it compares the received relevance value with the configured relevance threshold.
     * If the relevance falls below the threshold, the message is not relevant. Otherwise, it is relevant.
     *
     * @param chatMessage the received message
     * @return true if the message is relevant. Otherwise, returns false
     */
    private boolean isMessageRelevant(ChatMessage chatMessage) {
        if (chatMessage instanceof AiMessage aiMessage) {
            var aiMessageText = extractJsonBlock(aiMessage.text());
            try {
                var messageWithRelevance = objectMapper.readValue(aiMessageText, Assistant.ResponseMessageWithRelevance.class);
                if (!relevanceCalculator.isResponseRelevant(messageWithRelevance)) {
                    return false;
                }
            } catch (JsonProcessingException e) {
                LOGGER.warn("AiMessage was not a valid ResponseMessageWithRelevance:{}", aiMessageText);
            }
        }

        return true;
    }

    /**
     * Extracts the JSON block from the given text.
     *
     * @param text the text
     * @return the JSON block from the text
     */
    private String extractJsonBlock(String text) {
        Matcher matcher = JSON_BLOCK_PATTERN.matcher(text);
        if (matcher.find()) {
            return matcher.group();
        }
        return text;
    }
}


The RelevanceCalculator, which is used to calculate if a response is relevant — and is used here and at the AssistantConsumedMessageProcessor, looks like this:

Java
 
@Component
public class RelevanceCalculator {
    private static final Logger LOGGER = LoggerFactory.getLogger(RelevanceCalculator.class);
    private final CommunicationConfigurationProperties.AnswerRelevanceThresholdConfig answerRelevanceThresholdConfig;
    private final WeakHashMap<Assistant.ResponseMessageWithRelevance, Boolean> relevanceCache = new WeakHashMap<>();

    public RelevanceCalculator(CommunicationConfigurationProperties communicationConfiguration) {
        this.answerRelevanceThresholdConfig = communicationConfiguration.answerRelevanceThreshold();
    }

    /**
     * Checks if the given response message is relevant enough to be sent to the other participants of the conversation,
     * based on its relevance metadata.
     *
     * @param responseMessageWithRelevance The response message
     * @return true if the response message is relevant, false otherwise
     */
    public boolean isResponseRelevant(Assistant.ResponseMessageWithRelevance responseMessageWithRelevance) {

        var cachedRelevance = relevanceCache.get(responseMessageWithRelevance);
        if (cachedRelevance != null) {
            return cachedRelevance;
        }

        if (responseMessageWithRelevance.closingConversationThread()) {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("The message is a conversation thread closing message, so it is relevant: {}", responseMessageWithRelevance);
            }
            relevanceCache.put(responseMessageWithRelevance, true);
            return true;
        }

        if (StringUtils.hasLength(responseMessageWithRelevance.directlyAddressedParticipant()) && !responseMessageWithRelevance.name().equals(responseMessageWithRelevance.directlyAddressedParticipant())) {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("The message was directly addressed to another participant, so it is not relevant: {}", responseMessageWithRelevance);
            }
            relevanceCache.put(responseMessageWithRelevance, false);
            return false;
        }

        var relevanceMetadata = responseMessageWithRelevance.relevance();
        var relevancePercentage = new PercentageCalculator()
                .addFieldValue(relevanceMetadata.relevanceToProfession())
                .addFieldValue(relevanceMetadata.relevanceToInterests())
                .addFieldValue(relevanceMetadata.questionType())
                .addFieldValue(relevanceMetadata.confidenceInResponse())
                .addFieldValue(relevanceMetadata.contextAvailability())
                .addFieldValue(relevanceMetadata.urgencyLevel())
                .addFieldValue(relevanceMetadata.relationToPreviousChat())
                .addFieldValue(relevanceMetadata.alignmentWithPersonality())
                .calculate();

        var relevant = isRelevant(responseMessageWithRelevance.messageCategory(), relevancePercentage);
        relevanceCache.put(responseMessageWithRelevance, relevant);
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("The relevance value was {} for the following response, so its relevance is {}: {}", relevancePercentage, relevant, responseMessageWithRelevance);
        }

        return relevant;
    }

    /**
     * Checks if the given message category is relevant based on the given relevance percentage.
     *
     * @param messageCategory     The message category to check
     * @param relevancePercentage The relevance percentage
     * @return true if the message category is relevant based on the given relevance percentage, false otherwise
     */
    private boolean isRelevant(Assistant.MessageCategory messageCategory, int relevancePercentage) {
        var answerRelevanceThreshold = answerRelevanceThresholdConfig.thresholds().get(messageCategory);
        if (answerRelevanceThreshold == null) {
            return false;
        }
        return relevancePercentage >= answerRelevanceThreshold;
    }

    /**
     * Calculates the relevance percentage based on the given field values.
     */
    private static class PercentageCalculator {
        private static final int MAX_VALUE_PER_FIELD = 10;
        private int maxValue = 0;
        private int sum = 0;

        PercentageCalculator addFieldValue(int value) {
            maxValue += MAX_VALUE_PER_FIELD;
            sum += Math.min(value, MAX_VALUE_PER_FIELD);
            return this;
        }

        int calculate() {
            if (maxValue == 0) {
                return 0;

            }
            return (int) Math.round((sum * 100.0) / maxValue);
        }
    }
}


AI Assistant Conversation Limit

While the relevance-based filtering combined with well-tuned thresholds helps make the conversation feel more realistic, there is still a risk that a single message and all the replies that follow it may be considered highly relevant by both assistants, resulting in a never-ending discussion between them.

To prevent this, I introduced a conversation thread limit. In the YAML configuration, this appears alongside the relevance thresholds as follows:

YAML
 
  communication-config:
    answer-relevance-threshold:
      thresholds:
        -GENERAL_INQUIRY: 39
        -INTEREST_CASUAL: 52
        -EXPERTISE_RELATED: 61
        -OPINION_REQUEST: 66
        -INTEREST_DISCUSSION: 64
    conversation-thread-limit: 3


But what exactly do we mean by a conversation thread? Every time the user sends a message, a new conversation thread is started and becomes the active one.

The configured thread limit defines how many messages an AI assistant is allowed to send after a single user message. If an assistant already has a response in progress that belongs to an inactive thread, that message will still be delivered. However, once the limit is reached, the assistant is no longer allowed to process or send new messages that belong to that thread. Still, reaching the thread limit doesn't mean the conversation is suddenly cut off. That would easily lead to situations where one assistant asks a question and the other doesn't get the chance to respond, which would feel unnatural or incomplete.

As mentioned earlier, the AI receives messages in the form of a notification, which looks like this:

Plain Text
 
MESSAGE_NOTIFICATION
{
  message:'{{message}}',
  messageSender:'{{userName}}',
  conversationThreadLifecycle:'{{conversationThreadState}}',
  directlyAddressedParticipant:'{{directlyAddressedParticipant}}'
}


In this notification, the conversationThreadLifecycle field can take one of two values, represented by the following enum:

Java
 
enum ConversationThreadState {
    ACTIVE, FINALIZE
}


The ACTIVE state means the assistant can continue contributing to the conversation,
while FINALIZE signals that the thread has reached its limit and the assistant should try to close the conversation.

In addition, the assistant is expected to indicate whether its message is meant to close the conversation using the following field in the response:

Java
 
@Description("Is your response message intended to close the conversation thread?") boolean closingConversationThread


The AI receives instructions on how to handle the conversationThreadLifecycle field in the following part of the system prompt:

Plain Text
 
The length of a conversation thread is limited. Until you reach this limit, 
the conversationThreadLifecycle is set to ACTIVE and you can continue the conversation. 
Once you near to reach the limit, the status changes from ACTIVE to FINALIZE. 
The FINALIZE status tells you when you should try to close the conversation on the topic.


Additionally, if the assistant sets the closingConversationThread field in its response to indicate that the message is a closing one, the message is always forwarded without relevance filtering to ensure that conversation-ending messages are not skipped.

Moderator AI Assistant

Have you ever tried to ask something from one person in a group, but someone else jumped in and answered instead? It can be a little awkward, especially when the question wasn’t meant for them. The same thing can happen here. Even if you clearly address your message to one of the assistants, the other might still respond, especially if the message matches their personality or interests.

To solve this, the first step is determining whether a message is clearly addressed to a specific participant. Initially, I tried to let the assistants themselves make that decision, but the results were too inconsistent. I ended up separating this logic into a dedicated Moderator Assistant, whose only task is to evaluate each message and decide whether it is explicitly addressed to someone, and if so, to whom.

The interface of this Moderator Assistant looks like this:

Java
 
public interface ModeratorAssistant {

    @UserMessage("""
               {
               message:'{{message}}',
               messageSender:'{{messageSender}}'
               }
            """)
    MessageAnalyzeResponse analyze(@V("message") String message,
                                   @V("messageSender") String messageSenderName
    );


    record MessageAnalyzeResponse(
            @Description("""
                    The name of the conversion participant to whom the message was directly addressed.
                    If no one is explicitly addressed or the addressed person is not a participant of the chat,
                    return an empty string.
                    """)
            String directlyAddressedTo
    ) {
    }
}


And in the below YAML configuration, you can configure its model name and disable it if you want:

YAML
 
  moderator-assistant:
    enabled: true
    model-name: "gpt-4o-mini"


If enabled, the moderator is configured as follows:

Java
 
    @ConditionalOnProperty(
            value = "multiassistant.moderator-assistant.enabled",
            havingValue = "true",
            matchIfMissing = true)
    @Bean("moderatorAssistant")
    ModeratorAssistant moderatorAssistant(@Qualifier("moderatorAssistantChatLanguageModel") ChatLanguageModel chatLanguageModel,
                                          @Qualifier("moderatorAssistantChatMemory") ChatMemory chatMemory,
                                          UserConfigurationProperties userConfigurationProperties,
                                          FirstAssistantConfigurationProperties firstAssistantConfiguration,
                                          SecondAssistantConfigurationProperties secondAssistantConfiguration) {

        Map<String, Object> promptVariables = Map.of(
                "userName", userConfigurationProperties.description().name(),
                "userDescription", "He is the human participant of the chat.",
                "firstAssistantName", firstAssistantConfiguration.description().name(),
                "firstAssistantDescription", firstAssistantConfiguration.description().persona(),
                "secondAssistantName", secondAssistantConfiguration.description().name(),
                "secondAssistantDescription", secondAssistantConfiguration.description().persona()
        );


        return AiServices
                .builder(ModeratorAssistant.class)
                .chatLanguageModel(chatLanguageModel)
                .systemMessageProvider(memoryId -> PromptTemplate.from("""
                        You are a moderator of a team chat. You are responsible for analyzing every message before
                        it is sent to the team chat.
                        There are three participants in the team chat. One of them if a human with a name but without further information
                        and two AI participants with names and a description of their personality.
                        The participants of the conversation are:
                            1. participant:
                                -name: {{userName}},
                                -description: {{userDescription}}
                            2. participant:
                                -name: {{firstAssistantName}},
                                -description: {{firstAssistantDescription}}
                            3. participant:
                                -name: {{secondAssistantName}},
                                -description: {{secondAssistantDescription}}
                        """).apply(promptVariables).text())
                .chatMemory(chatMemory)

                .build();
    }


As shown in the configuration above, the system message not only explains that the assistant's role is to analyze incoming messages, but it also explicitly lists the names and descriptions of all participants. This gives the moderator enough context to determine who a message might be addressed to later on.

If the Moderator is not enabled, a placeholder ModeratorAssistant is used instead. This version does not perform any actual validation and always returns an empty string when asked to determine the name of the addressed participant.

Java
 
    @ConditionalOnProperty(
            value = "multiassistant.moderator-assistant.enabled",
            havingValue = "false",
            matchIfMissing = false)
    @Bean("moderatorAssistant")
    ModeratorAssistant nonFunctionalModeratorAssistant() {
        return new ModeratorAssistant() {
            @Override
            public MessageAnalyzeResponse analyze(String message, String messageSenderName) {
                return new MessageAnalyzeResponse("");
            }
        };
    }


Each time before sending a message notification, we determine whether the message is directly addressed to a specific participant. If so, the recipient's name is included in the directlyAddressedParticipant field of the message notification sent to each AI assistant. Additionally, we expect the assistant to include the same name in its response's directlyAddressedParticipant field if the message was directed at someone.

If a message is clearly addressed to a specific participant, and that participant is not the current assistant, the RelevanceCalculator considers the response irrelevant, and the message is not forwarded to anyone. This means that when the ModeratorAssistant is enabled, a message like "How are you, Bill?" will only trigger a response from the assistant named Bill.

In contrast, a general message such as "Hello everyone!" may be answered by all participants. If the ModeratorAssistant is disabled, the assistants are free to respond more openly, but this can result in situations where the reply comes from someone other than the intended recipient, or from multiple assistants at once.

Application User Interface

I tried to keep the front end of the application as simple as possible. The client side is composed of the following building blocks:

  • For resolving client-side dependencies, I used WebJars.
  • To ensure basic styling, I included Bootstrap 5 CSS and a small custom style.css.
  • The front end itself is nothing more than a single Thymeleaf template that uses HTMX to enable dynamic behavior.

The dependencies of the front end are the following:

XML
 
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
<dependency>
    <groupId>org.webjars</groupId>
    <artifactId>bootstrap</artifactId>
    <version>5.3.3</version>
</dependency>
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>htmx.org</artifactId>
    <version>2.0.4</version>
</dependency>
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>htmx-ext-json-enc</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>htmx-ext-sse</artifactId>
    <version>2.2.3</version>
</dependency>
<dependency>
    <groupId>org.webjars</groupId>
    <artifactId>webjars-locator</artifactId>
    <version>0.52</version>
</dependency>


In the dependencies above:

  • htmx-ext-json-enc is an HTMX extension that makes it possible to send data as JSON.
  • htmx-ext-sse is an HTMX extension that enables handling Server-Sent Events on the UI.
  • webjars-locator allows you to reference client-side dependencies in a version-agnostic way.

To make the front end able to access the WebJar dependencies, the following configuration is required:

Java
 
@Configuration
public class WebConfiguration implements WebMvcConfigurer {
    @Override
    public void addResourceHandlers(ResourceHandlerRegistry registry) {
        registry
            .addResourceHandler("/webjars/**")
            .addResourceLocations("/webjars/")
            .resourceChain(false);
    }
}


The custom style.css looks like this:

CSS
 
.chat-container {
    max-width: 800px;
    margin: 20px auto;
    display: flex;
    flex-direction: column;
}
.chat-box {
    height: 1000px;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 10px;
    border-radius: 5px;
    background: #ffffff;
    display: flex;
    flex-direction: column;
}
.message {
    padding: 8px;
    border-radius: 8px;
    margin-bottom: 10px;
    max-width: 75%;
    word-wrap: break-word;
    overflow-wrap: break-word;
    white-space: normal;
}
.message.user {
    background: #d1e7dd;
    align-self: flex-start;
    border: 2px solid #ccc;
}
.message.ai {
    background: #e2e3e5;
    align-self: flex-end;
    border: 2px solid #ccc;
}
.message .author {
    font-weight: bold;
    font-size: 0.9em;
    margin-bottom: 2px;
    border-bottom: 1px solid #ccc;
}
.message .message-content {
    white-space: pre-wrap;
}


The front end is intentionally minimal and consists of a single index.html page. It lets you submit messages via POST and displays incoming message items using Server-Sent Events:

HTML
 
<!DOCTYPE HTML>
<html lang="en" xmlns:th="http://www.thymeleaf.org" xmlns="http://www.w3.org/1999/html">
<head>
    <meta charset="utf-8">
    <title>Multi Assistant Chat</title>

    <script th:src="@{/webjars/htmx.org/dist/htmx.min.js}"></script>
    <script th:src="@{/webjars/htmx-ext-json-enc/dist/json-enc.min.js}"></script>
    <script th:src="@{/webjars/htmx-ext-sse/dist/sse.min.js}"></script>
    <link th:href="@{/webjars/bootstrap/css/bootstrap.min.css}" rel="stylesheet">
    <link th:href="@{/css/style.css}" rel="stylesheet">

</head>
<body>
<div class="chat-container">
    <div id="stream-container" hx-ext="sse" sse-connect="message/stream"
         hx-target="#chatBox"
    >
        <div class="chat-box" id="chatBox"
             sse-swap="message"
             hx-swap=beforeend
             hx-on::after-settle="this.scrollTo(0, this.scrollHeight);"
        >
        </div>
    </div>

    <form th:action="@{/message/send}" method="post" class="mt-3"
          hx-post="/message/send"
          hx-ext="json-enc"
          hx-target="#chatBox"
          hx-swap="beforeend"
          hx-on::after-request="if(event.detail.successful) this.reset()"
    >
        <div class="input-group">
            <input id="message-input" type="text" class="form-control" name="message" placeholder="Type a message..." required>
            <button type="submit" class="btn btn-primary">Send</button>
        </div>
    </form>
</div>

</body>
</html>


Below is the controller, where the subscribeMessages method handles the creation of the SSE connection, while the sendMessage method initializes a new conversation thread whenever a user message is received.

It also sends the user's own message back through the SseEmitter (so that it appears in the chat) and forwards it to the other participants via the MessagePublisher:

Java
 
@RequestMapping("/message")
@RestController
public class MessageController {
    private final UserConfigurationProperties userConfigurationProperties;
    private final MessagePublisher messagePublisher;
    private final ConversationThreadHandler conversationThreadHandler;
    private final MessageStreamService messageStreamService;

    public MessageController(UserConfigurationProperties userConfigurationProperties,
                             MessagePublisher messagePublisher,
                             ConversationThreadHandler conversationThreadHandler,
                             MessageStreamService messageStreamService) {
        this.userConfigurationProperties = userConfigurationProperties;
        this.messagePublisher = messagePublisher;
        this.conversationThreadHandler = conversationThreadHandler;
        this.messageStreamService = messageStreamService;
    }

    @GetMapping(path = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public SseEmitter subscribeMessages() {
        return messageStreamService.init();
    }

    public record UserMessage(String message) {
    }

    @PostMapping(value = "/send")
    public ResponseEntity<Void> sendMessage(@RequestBody UserMessage userMessage) throws IOException {
        var messageToSend = new Message()
                .setConversationThreadId(conversationThreadHandler.initConversationThread())
                .setName(userConfigurationProperties.description().name())
                .setMessage(userMessage.message());
        this.messageStreamService
                .getStream().orElseThrow(() -> new RuntimeException("Stream not initialized"))
                .send(MessageHtmlUtil.messageToUserHtmlChatItem(messageToSend));

        this.messagePublisher.publish(messageToSend);

        return ResponseEntity.ok().build();
    }

}


There are two more things worth mentioning. The first is the MessageHtmlUtil, which is used both in this controller and in the MessageStreamConsumedMessageProcessor. HTMX doesn't expect JSON responses; instead, it works with server-rendered HTML fragments returned in response to user interactions, which are then swapped into the DOM.

To support this behavior, I created the following utility class to convert both user and assistant messages into proper HTML chat items:

Java
 
public final class MessageHtmlUtil {
    private static final String MESSAGE_ITEM = """
                        <div class="message %s">
                            <div class="author">%s</div>
                            <div class="message-content">%s</div>
                        </div>
            """;

    private MessageHtmlUtil() {
    }

    /**
     * Converts a message to an HTML chat item representing a user message.
     *
     * @param message the message to convert
     * @return the HTML chat item as a string
     */
    public static String messageToUserHtmlChatItem(Message message) {
        return MESSAGE_ITEM.formatted("user", message.getName(), message.getMessage());
    }

    /**
     * Converts a message to an HTML chat item representing an AI message.
     *
     * @param message the message to convert
     * @return the HTML chat item as a string
     */
    public static String messageToAiHtmlChatItem(Message message) {
        return MESSAGE_ITEM.formatted("ai", message.getName(), message.getMessage());
    }
}


The second is the MessageStreamService, which handles the initialization and access to the SseEmitter. Note that in more complex applications that support multiple users, this would require more careful handling, but for this example, managing a single SseEmitter instance is sufficient:

Java
 
@Service
public class SSEMessageStreamService implements MessageStreamService {
    private SseEmitter sseEmitter;

    @Override
    public SseEmitter init() {
        if (sseEmitter == null) {
            sseEmitter = new SseEmitter(Long.MAX_VALUE);
        }
        return sseEmitter;
    }

    @Override
    public Optional<SseEmitter> getStream() {
        return Optional.ofNullable(sseEmitter);
    }
}


This simple UI setup is enough to support interactive conversations between the user and the assistants without introducing unnecessary complexity.

AI Assistant Interaction

Now that everything is wired together, here’s a short conversation with Bill, a grumpy old cinematographer who hates modern films, and Joe, a sarcastic young thrash metal fan.

First, a greeting:

Greeting


A little conversation about music:

Conversation about music 1

Conversation about music 2


And a little chat about movies:

Conversation about movies 1

Conversation about movies 2

Conversation about movies 3


Finally, let's say goodbye:

Goodbye

Conclusion

I hope you enjoyed this little conversation with Bill and Joe. I tried to show the most important parts of the application, but of course, the full source code is available on GitHub if you’d like to explore it in more detail.

Feel free to give it a try. I’m sure you can come up with personalities even more interesting than the two I used here. Just keep in mind that if you change the personalities or the chat model, you might need to fine-tune the thresholds to get the best experience. While some of the functionality, like relevance evaluation, could be handled by tools like TruLens, this project was more about modeling behavior through direct interaction logic. And if I’m being honest, I also had a chat with John Spartan and Simon Phoenix from the Demolition Man movie about the three seashells, but since they didn’t tell me anything new, I left that one out of the article.

AI Assistant (by Speaktoit) Spring Boot

Published at DZone with permission of Tamas Kiss. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How AI Coding Assistants Are Changing Developer Flow
  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • Building Scalable Agentic Assistants: A Graph-Based Approach
  • Zero-Cost AI with Java

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook