How to Build a Self-Evolving AI Agent That Learns From Failure
This guide demonstrates how to transform brittle AI agents into resilient systems that reflect on failures and retain learnings to avoid repeating errors.
Join the DZone community and get the full member experience.
Join For FreeFor developers building autonomous systems, today's generative AI agents present a fundamental challenge: they are amnesiacs. An agent can execute a complex task, fail, and then repeat the same mistake five minutes later. Their capabilities are "test-time static," meaning they are frozen at the moment their training ends. They cannot learn from their interactions, discard valuable insights, or correct their own errors.
For developers and architects trying to build reliable autonomous systems, this is the primary barrier to adoption. An unreliable agent is not autonomous. It is a brittle system that creates technical debt.
This is why the true frontier of AI is not just about building larger models but about creating agents that can learn. This article will show you how to build a simple self-evolving agent in Python. We will ditch the academic theory and write code for an agent that learns from its failures using a persistent, structured memory we call a "ReasoningBank."
Think of ReasoningBank as retrieval-augmented generation (RAG) for strategy rather than data. Instead of fetching static documents, the agent retrieves high-level reasoning patterns distilled from its past successes and failures. This allows the agent to consult a dynamic playbook of proven strategies at inference time, which it continuously updates after every interaction.
The Problem: Static, Brittle Agents
Today's LLM agents combine a core model with planning modules and tools, making them vulnerable to error propagation. A single root-cause error, such as misusing a tool or calling an unreliable API, can cascade through all subsequent steps and ultimately lead to task failure.
This makes them unsuitable for the "long-horizon" technical challenges that define real-world value, such as managing a software project, conducting complex data analysis, or automating multi-step DevOps workflows.
A Practical Solution: Building a "ReasoningBank" in Python
Let's build an agent that stops making the same mistake twice. The core mechanism involves a shift from amnesia to experience facilitated by a persistent and structured memory. Instead of treating every task as its first, this architecture allows an agent to run a continuous "Plan-Execute-Reflect-Memorize" loop.
Step 1: Define a Relatable Task
Before we build the brain, let's define the work. We need a task that simulates a real-world scenario, like fetching data from a project management API.
In a perfect world, this function always works. In our simulation, it is "flaky." It fails under specific conditions (an expired key or a bad host), which forces our agent to adapt.
import json
import os
from typing import List, Dict, Any, Optional
# --- Our Simulated "Flaky Tool" ---
def call_external_api(project_id: str, api_key: str):
"""
Simulates a tool call that usually works but fails under specific conditions.
Target: Retrieve project data.
"""
# Specific failure 1: Expired Key
if project_id == "project-123" and api_key == "old-key-xyz":
raise ValueError("APIError: Invalid API Key. Key 'old-key-xyz' is expired.")
# Specific failure 2: Bad Host
if project_id == "project-789":
raise ConnectionError("NetworkError: Cannot connect to host for project-789.")
# Success path
print(f"Successfully called API for {project_id} with {api_key}")
return f"Success: Data for {project_id}"
Step 2: The Memory (ReasoningBank)
Now we need a place to store lessons. This simple class saves insights to a local JSON file so the agent remembers them even after the script restarts.
REASONING_BANK_FILE = "reasoning_bank.json"
class ReasoningBank:
"""A simple JSON-based persistent memory for our agent."""
def __init__(self):
self.memory: List[Dict[str, Any]] = []
self.load_memory()
def load_memory(self):
"""Loads lessons from the JSON file."""
if os.path.exists(REASONING_BANK_FILE):
try:
with open(REASONING_BANK_FILE, 'r') as f:
self.memory = json.load(f)
print(f"Loaded {len(self.memory)} lessons from {REASONING_BANK_FILE}")
except json.JSONDecodeError:
self.memory = []
else:
print("No reasoning bank found, starting fresh.")
def save_memory(self):
"""Saves all lessons back to the JSON file."""
with open(REASONING_BANK_FILE, 'w') as f:
json.dump(self.memory, f, indent=2)
def add_lesson(self, lesson: Dict[str, Any]):
"""Adds a new, structured lesson to the memory."""
self.memory.append(lesson)
self.save_memory()
def find_relevant_lessons(self, task_description: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Finds lessons that match the current task.
A real implementation might use vector search here.
"""
relevant_lessons = []
current_project = task_description.get('project_id')
if not current_project:
return []
for lesson in self.memory:
# We match lessons specifically to the project ID for this demo
if lesson.get("context", {}).get("project_id") == current_project:
relevant_lessons.append(lesson)
return relevant_lessons
Step 3: The Brain (Reflection)
This function is the most critical component. It translates raw error text into actionable strategies. It is the difference between logging an error and understanding it.
def reflect_on_failure(task: Dict[str, Any], error: Exception) -> Dict[str, Any]:
"""
Analyzes the error to create a generalizable, actionable lesson.
"""
print(f"\n--- REFLECTION ---")
error_str = str(error)
lesson = {
"type": "FAILURE_AVOIDANCE",
"context": task,
"error": error_str,
}
# Root Cause Analysis Logic
if "Invalid API Key" in error_str and "expired" in error_str:
lesson["root_cause"] = "The API key used for this project is expired."
lesson["strategy"] = {
"action": "replace_param",
"param": "api_key",
"old_value": task.get("api_key"),
"new_value": "new-key-abc" # In prod, this would be fetched securely
}
print("Lesson: Expired key detected. Strategy: Update key.")
elif "NetworkError" in error_str:
lesson["root_cause"] = "The project's host is unreachable."
lesson["strategy"] = {
"action": "skip_task",
"reason": "Project host is down. Do not retry."
}
print("Lesson: Host unreachable. Strategy: Skip future attempts.")
else:
lesson["root_cause"] = "Unknown error."
lesson["strategy"] = {"action": "log_and_skip", "reason": "Unhandled error."}
print("--- END REFLECTION ---")
return lesson
Step 4: The Agent Structure
Finally, we assemble the agent. Its execute_task method is wrapped in logic that checks the ReasoningBank before acting.
class SelfEvolvingAgent:
def __init__(self):
self.memory_bank = ReasoningBank()
def create_plan(self, task: Dict[str, Any]) -> Optional[Dict[str, Any]]:
"""
Consults memory before execution.
"""
print(f"\nPlanning task for project: {task.get('project_id')}")
relevant_lessons = self.memory_bank.find_relevant_lessons(task)
if not relevant_lessons:
return task # No lessons, proceed as planned
# Apply the most recent lesson
latest_lesson = relevant_lessons[-1]
strategy = latest_lesson.get("strategy", {})
action = strategy.get("action")
print(f"Found relevant lesson: {action}")
if action == "replace_param":
new_task = task.copy()
param = strategy.get("param")
new_val = strategy.get("new_value")
if param in new_task:
print(f"Applying lesson: Replacing '{param}' with '{new_val}'")
new_task[param] = new_val
return new_task
elif action == "skip_task":
print(f"Applying lesson: Skipping task. Reason: {strategy.get('reason')}")
return None # None signals "do not execute"
return task
def execute_task(self, task: Dict[str, Any]):
# 1. PLAN (Consult memory)
plan = self.create_plan(task)
if plan is None:
print("--- Task execution skipped based on past failures. ---")
return
# 2. EXECUTE
try:
print(f"Executing task with params: {plan}")
result = call_external_api(
project_id=plan.get("project_id"),
api_key=plan.get("api_key")
)
print(f"--- Task Succeeded: {result} ---")
# 3. REFLECT (On failure)
except (ValueError, ConnectionError) as e:
print(f"--- Task Failed: {e} ---")
failure_lesson = reflect_on_failure(plan, e)
self.memory_bank.add_lesson(failure_lesson)
Putting It All Together: The Execution Flow
Now we can watch the agent learn in real time. We will interleave the execution code with the agent's reasoning process.
First Attempt
The agent tries task_1 with old-key-xyz. The call_external_api function throws a ValueError. The except block catches this.
# Clean setup for demo
if os.path.exists(REASONING_BANK_FILE):
os.remove(REASONING_BANK_FILE)
agent = SelfEvolvingAgent()
task_1 = {
"project_id": "project-123",
"api_key": "old-key-xyz"
}
print("\n========= ATTEMPT 1 (Expect Failure) =========")
agent.execute_task(task_1)
Reflect and memorize: The reflect_on_failure() function is triggered. It analyzes the error message, identifies the root cause ("Invalid API Key"), and creates a structured lesson with a "replace_param" strategy. This lesson is saved to reasoning_bank.json.
Second Attempt
The agent is asked to redo task_1. This time, create_plan() queries the ReasoningBank and retrieves the lesson. It applies the strategy by modifying the task to use new-key-abc, and the execution succeeds.
print("\n========= ATTEMPT 2 (Expect Success via Adaptation) =========")
agent.execute_task(task_1)
Third Attempt
The agent tries a new task, task_2. This project hosts a different error (ConnectionError). The agent has no prior lessons for this project, so it attempts execution and fails.
task_2 = {
"project_id": "project-789",
"api_key": "any-key"
}
print("\n========= ATTEMPT 3 (Expect Network Failure) =========")
agent.execute_task(task_2)
Reflect and memorize: The agent reflects and creates a new lesson with a "skip_task" strategy because the host is down.
Fourth Attempt
On its fourth attempt, the planner sees this "skip_task" lesson and decides not to attempt executing the task. This saves time and computational resources.
print("\n========= ATTEMPT 4 (Expect Skip) =========")
agent.execute_task(task_2)
Conclusion
By moving from static, amnesiac agents to dynamic systems that learn, we unlock the door to true autonomy. The difference is a simple, persistent JSON file and a single reflect_on_failure function.
This is the shift from a brittle, black-box tool to an adaptive, resilient system. It is one that you can trust to manage a CI/CD pipeline, not just run a single script, precisely because it has memory and the capacity to improve.
Published at DZone with permission of Apratim Mukherjee. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments