DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Why GenAI Apps Could Fail Without Agentic Frameworks
  • Function Calling and Agents in Agentic AI
  • AI-Based Threat Detection in Cloud Security

Trending

  • Integrating Security as Code: A Necessity for DevSecOps
  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • Measuring the Impact of AI on Software Engineering Productivity
  • Start Coding With Google Cloud Workstations
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Why State Management Is the #1 Challenge for Agentic AI

Why State Management Is the #1 Challenge for Agentic AI

Explore the unique state management challenges posed by agentic AI, and see why traditional cloud-native approaches fall short as AI agents evolve to learn and adapt over time.

By 
Jason Bloomberg user avatar
Jason Bloomberg
·
Apr. 09, 25 · Opinion
Likes (2)
Comment
Save
Tweet
Share
2.8K Views

Join the DZone community and get the full member experience.

Join For Free

When you have a conversation with a chatbot, you want it to remember previous interactions within that conversation. That’s what it means to have a conversation, after all.

When you use generative AI (genAI) to perform some analysis task beyond a single response to a prompt, you want it to retain the context of earlier prompts within that task.

When a company wants AI to automate a workflow — a sequence of steps over time, with human input along the way — you want the AI to keep track of where each user is along their instance of the workflow.

These examples are all situations where we expect our AI to maintain state information — some persisted data that keeps track of interactions or automated tasks over time.

Now that agentic AI is here, however, these examples of state management don’t go far enough.

The missing piece: we want AI to learn. We want our agents to get smarter over time.

Suddenly, all our traditional approaches to managing the state of interactions in a distributed computing environment fall short.

Give Me a Cookie

Every generation of technology has had to deal with the central computing challenge of how to manage state.

The default approach, writing state information for every user and every interaction to a database on the server, worked well enough up to a point.

However, keeping track of state information on a server somewhere doesn’t scale. Eventually, stateful applications bog down.

In contrast, stateless interactions offer massive scale. When the back-end, server-side parts of our applications don’t have to keep track of users or their requests, then scaling them out is a simple exercise.

Unfortunately, so many things we might want to do require us to keep track of something over time, which requires state management.

We had to figure this out for the Web. Then, we had to figure it out for the cloud — which eventually meant that we had to figure it out once again for microservices.

Now, AI agents are here. Guess what? We need to figure out state management all over again.

Microservices to the Rescue?

Most implementations of AI agents run as microservices. You might think, therefore, that microservices would address the AI state management problem.

Microservices are inherently stateless, enabling them to scale massively and dramatically since any microservice can respond to any request just as well as any other identical microservice.

Statelessness thus enables microservices’ inherently ephemeral and elastic nature, properties that arguably make cloud native computing what it is today.

Managing state with microservices without limiting their scalability and slowing everything down is one of the most important architectural challenges of cloud native computing.

Kubernetes handles state management by adding an abstraction layer. StatefulSets are objects that enable microservices to maintain state information by abstracting the persistence tier.

Stateful microservices must still write state information to a database somewhere, but with StatefulSets, each microservice doesn’t have to worry about the specifics.

The Kubernetes infrastructure handles data scalability behind the scenes, along with managing the data consistency that has always been the challenge to building massively scalable persistence infrastructure.

Given AI agents typically run as microservices, can cloud native computing address their state management challenges?

No. There is still something missing.

The AI Agent State Conundrum

Many of today’s genAI applications are stateless — feed them a prompt, get a response, and then they forget all about you and what you asked before.

Cookies (or generally, maintaining state on the client) and microservices (maintaining state on the server) are both necessary for managing AI state. However, they are not sufficient.

The first dimension: keeping track of what each user is doing. Maintaining state on the client can handle this.

For example, a chatbot that keeps track of each conversation with each user. Bonus points if a user can pick up a conversation after leaving the chatbot and coming back later.

The second dimension: keeping track of interactions across users. Now, we can call upon stateful microservices.

For example, an AI agent might update a CRM app. Other people, and indeed other AI agents, will see those updates and be able to make decisions based on the new information.

Cloud native computing handles such multi-user situations well. By abstracting the persistence tier (in this case, the data store behind the CRM app), the Kubernetes infrastructure can scale.

What’s missing from this story is the third dimension: AI agents that can learn.

A chatbot, for example, might get smarter over time about a particular user’s preferences. A travel chatbot should ideally understand simple things like whether a user prefers an aisle or window seat – but should also learn more subtle, complex preferences specific to each user or relationships among users (for instance, your spouse’s preferences as well as your own when you travel together).

The AI agent should also get smarter over time about all collaborative interactions it is called upon to support. Simply updating CRM records, for example, is not a particularly valuable task for an AI agent. Understanding how best to leverage the CRM to optimize sales efforts, a task that requires agents to learn over time, is a different story.

Why AI Agent State Management Is Different

The behavior of an AI agent (or any other AI-based application, for that matter) depends upon its training data. Change the underlying data, and you change the agent’s behavior.

For AI agents to learn, they must feed information from ongoing user interactions back to the training data, thus changing the agents’ behavior with each iteration.

In other words, changing the training data changes the state of the agent. Prompts and other contextual information about the behavior of the agent all become training data for any agent with the ability to learn over time.

This iterative learning behavior — and hence, the training-based state management challenge — is specific to agentic AI because such training will make AI agents smarter over time.

However, we must still deal with whether we want individual agent instances to learn about individual user preferences, or to learn from all interactions across sets of users, even if they are interacting with distinct AI agent instances.

Ideally, we’d want a mix of both — agents getting smarter from interacting with many users while simultaneously getting smarter about each set of interactions with every individual, thus becoming personalized.

The Unsolved Challenge of AI State Management

How, then, do we tackle this complex state management challenge, given the training data themselves represent the state of each agent?

No one has solved this problem yet (to my knowledge) — but it’s clear that people have at least discerned the underlying issue.

We’ve seen this problem appear as a trope in fiction. Remember the movie Her, where a hapless Joaquin Phoenix falls in love with a ‘female’ AI agent?

The agent learns over time in a very personal way specific to Phoenix’s character. Eventually, ‘she’ becomes a unique individual, only to (spoiler alert!) be reset to ‘her’ factory default, thus ‘killing’ her.

Where fiction goes, soon goes reality. What information do we want AI agents to keep track of? Given we want them to learn, then just what do we want them to learn, and when? And how do we decide?

Then, of course, we need to figure out how to build the technology necessary to support the ever-changing state of each of our AI agents — while enabling them to scale without breaking the bank.

AI agentic AI

Published at DZone with permission of Jason Bloomberg, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Why GenAI Apps Could Fail Without Agentic Frameworks
  • Function Calling and Agents in Agentic AI
  • AI-Based Threat Detection in Cloud Security

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!