DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Production Checklist for Tool-Using AI Agents in Enterprise Apps
  • MCP + AWS AgentCore: Give Your AI Agent Real Tools in 60 Minutes
  • Designing Production-Grade AI Tools: Why Architecture Matters More Than Models
  • MCP vs Skills vs Agents With Scripts: Which One Should You Pick?

Trending

  • Retesting Best Practices for Agile Teams: A Quick Guide to Bug Fix Verification
  • Why AI-Generated Code Breaks Your Testing Assumptions
  • Agentic Testing: Moving Quality From Checkpoint to Control Layer
  • S3 Vectors: How to Build a RAG Without a Vector Database
  1. DZone
  2. Coding
  3. Tools
  4. Idempotency in AI Tools: The Most Expensive Thing Teams Forget

Idempotency in AI Tools: The Most Expensive Thing Teams Forget

An analysis of how retries cause duplicate inference in AI tools and how idempotency keeps production systems predictable and cost-controlled.

By 
Aditya Gupta user avatar
Aditya Gupta
·
Mar. 02, 26 · Analysis
Likes (1)
Comment
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

When AI tools move from a test environment to real-world use, the first “surprise” a developer encounters is rarely about accuracy. It’s usually something more problematic: the system behaves inconsistently, costs climb faster than expected, and the same job seems to run multiple times.

That’s not an AI problem. That’s a distributed systems problem. And in AI systems, this particular failure is extra problematic because every duplicate run has a direct dollar value impact. Idempotency is the fix. Not the only fix, but often the most impactful one.

What Does Idempotency Mean?

If the same request/job is processed more than once, the end result is the same as if it had been processed once. This matters because in a production environment, duplicate processing is normal. It happens mostly due to:

  • Network timeouts
  • User/client retries
  • Queue failure and redelivery
  • Mid-flight worker failures
  • Timeouts due to long-running tasks
  • Race conditions between parallel users 

Impact of Non-Idempotency

In many non-AI systems, duplicate processing is problematic but not catastrophic. In AI tools, duplicate processing is catastrophic as it directly affects profitability and is often silent.

A duplicate inference call costs:

  • Extra tokens
  • Extra latency
  • Extra downstream load

A Day-to-Day Example to Explain the Impact

Think of a warehouse fulfilling orders.

  • A customer places an order.
  • The system prints a packing slip.
  • The packer ships the item.

Now imagine the print request times out. The system re-tries. Two slips are printed, and two items get shipped. Nothing “crashed.” But the cost just doubled, creating a chaos of issues.

AI tools behave the same way when a job is re-tried, and inference runs twice.

Reason for Duplicates

1. Client or API Retries

Sometimes a client sends a "start enrichment" request to an LLM and waits for a reply. If it doesn't hear back in time, it tries again, leading to multiple requests for the same thing.

2. Queue Redelivery

Message queues like SQS are designed for reliability, not for making sure something is delivered only once. If something goes wrong, that same message might get processed again.

3. Worker Fails Mid-Task

If a worker fails mid-task while processing a job, then the system might retry the job regardless of the status of the failed job.

Basic Implementation and Failure

Here’s a pattern early AI services adopt:

Python
 
def process_job(job_id: str, payload: dict) -> dict:
    # 1) Make a call to the LLM
    result = llm_enrich(payload)

    # 2) Save the LLM output
    db.save_result(job_id, result)

    return result


This works perfectly until something goes wrong between step 1 and step 2. If the worker crashes after inference but before saving, the system will retry the job and call the LLM again.

Improved approach:

A way to make AI jobs idempotent is:

  1. Use a stable idempotency key (job_id).
  2. Store state before inference.
  3. If the job is already completed, return the stored result.
  4. Ensure only one worker “owns” the right to run inference at a time.

Below is a Python example that demonstrates the pattern.

Data Model

A table keyed by job_id:

  • status: PENDING | IN_PROGRESS | COMPLETED | FAILED
  • result: optional
  • attempts: count
  • updated_at: timestamp

Code: Idempotent job execution

Python
 
import time
from dataclasses import dataclass
from typing import Any, Dict, Optional


@dataclass
class JobRecord:
    job_id: str
    status: str  # PENDING | IN_PROGRESS | COMPLETED | FAILED
    result: Optional[Dict[str, Any]] = None
    attempts: int = 0
    updated_at: float = 0.0


class InMemoryJobStore:
    """
    This is a stand-in for DynamoDB/Postgres/etc.
    The focus is on the logic, not the storage engine.
    """
    def __init__(self):
        self._store: Dict[str, JobRecord] = {}

    def get(self, job_id: str) -> Optional[JobRecord]:
        return self._store.get(job_id)

    def put_if_absent(self, job: JobRecord) -> bool:
        if job.job_id in self._store:
            return False
        self._store[job.job_id] = job
        return True

    def try_mark_in_progress(self, job_id: str) -> bool:
        """
        In real systems this should be atomic (conditional update).
        Here we simulate it.
        """
        rec = self._store.get(job_id)
        if rec is None:
            return False
        if rec.status in ("IN_PROGRESS", "COMPLETED"):
            return False
        rec.status = "IN_PROGRESS"
        rec.attempts += 1
        rec.updated_at = time.time()
        return True

    def mark_completed(self, job_id: str, result: Dict[str, Any]) -> None:
        rec = self._store[job_id]
        rec.status = "COMPLETED"
        rec.result = result
        rec.updated_at = time.time()

    def mark_failed(self, job_id: str, error: str) -> None:
        rec = self._store[job_id]
        rec.status = "FAILED"
        rec.result = {"error": error}
        rec.updated_at = time.time()


def llm_enrich(payload: Dict[str, Any]) -> Dict[str, Any]:
    """
    Replace this with your real LLM/API call.
    """
    time.sleep(0.2)
    return {
        "normalized_title": payload.get("title", "").strip().title(),
        "category": payload.get("category", "unknown"),
    }


def process_job_idempotent(job_store: InMemoryJobStore, job_id: str, payload: Dict[str, Any]) -> Dict[str, Any]:
    # 1) If already completed, return cached result
    existing = job_store.get(job_id)
    if existing and existing.status == "COMPLETED":
        return existing.result or {}

    # 2) Ensure the job record exists (idempotent create)
    job_store.put_if_absent(JobRecord(job_id=job_id, status="PENDING", updated_at=time.time()))

    # 3) Acquire the right to process (in real systems: conditional update)
    if not job_store.try_mark_in_progress(job_id):
        # Someone else is processing, or it already completed
        existing = job_store.get(job_id)
        if existing and existing.status == "COMPLETED":
            return existing.result or {}
        return {"status": "IN_PROGRESS", "job_id": job_id}

    # 4) Run inference once
    try:
        result = llm_enrich(payload)
        job_store.mark_completed(job_id, result)
        return result
    except Exception as e:
        job_store.mark_failed(job_id, str(e))
        raise


This code achieves:

  • If the same job is sent twice, it doesn’t automatically trigger two AI calls.
  • If a retry happens after completion, it returns the stored output.
  • If a job is already in progress, it avoids duplicate work and returns “in progress.”

Comparison With vs. Without Idempotency

Without idempotency With idempotency

A retry can trigger a full inference run again

Retries resolve to state checks instead of inference calls

AI calls scale with delivery attempts, not logical jobs

AI calls map one-to-one with logical jobs

The same job may produce multiple outputs

Each job produces a single authoritative output

Logs contain repeated executions for the same job ID

Logs reflect a clear job lifecycle

Cost and behavior vary based on timing and failures

Cost and behavior remain predictable under retries

 

A common mistake developers make is to only check for COMPLETED. That helps, but it doesn’t stop two workers from both starting the job at the same time.

It needs two safeguards:

  1. Result caching (return if completed)
  2. A lock/claim mechanism (only one worker can execute)

If step 2 is skipped, it will still get duplicate inference during concurrency bursts.

Closing Thoughts

AI systems are expensive in ways traditional systems aren’t. When duplicates happen in a normal tool, we might waste compute. When duplicates happen in an AI tool, we often waste real money.

Idempotency won’t make the model smarter, but it will make the system survivable, cost-effective, and scalable.

AI Tool

Opinions expressed by DZone contributors are their own.

Related

  • Production Checklist for Tool-Using AI Agents in Enterprise Apps
  • MCP + AWS AgentCore: Give Your AI Agent Real Tools in 60 Minutes
  • Designing Production-Grade AI Tools: Why Architecture Matters More Than Models
  • MCP vs Skills vs Agents With Scripts: Which One Should You Pick?

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook