Reliable AI Agent Architecture for Mobile: Timeouts, Retries, and Idempotent Tool Calls
Ship reliable mobile agents: timeout everything, retry by error class, persist steps across restarts, and require idempotency keys for write tools.
Join the DZone community and get the full member experience.
Join For FreeMobile is where “agent reliability” stops being a nice-to-have and turns into incident prevention.
On desktop or server environments, a flaky call is annoying. On mobile, it’s normal:
- networks drop mid-request
- users background the app
- the OS kills your process
- users tap twice because the UI looks stuck
- retries happen across app restarts
If your agent can call tools (APIs, payments, writes, device actions), you need an architecture that guarantees retries won’t duplicate side effects.
The Failure Mode You’re Shipping Today (Even if You Don’t Realize It)
A user says: “Send invoice to customer.”
Agent plan:
createInvoice(customerId, amount)sendInvoice(invoiceId)
Mobile reality:
- request #1 succeeds on the server, but the response never reaches the phone
- the app retries after reconnect
- a duplicate invoice is created
- then
sendInvoiceruns twice - the customer gets two invoices, and trust is gone
This is not an “agent problem.” It’s a mobile reliability plus side-effects problem.
Mobile agents aren’t chatbots; they’re workflows running on unreliable clients. If you don’t persist intent, classify tools, and enforce idempotency at the write boundary, retries become duplicates. The model isn’t the risk. The system is. Treat tool calls like payments: budgeted, replayable, and auditable — or you ship incidents by default.

The Mobile-Specific Constraints Your Architecture Must Respect
- Intermittent connectivity (timeouts are normal, not exceptional)
- Lifecycle interruption (background/foreground, configuration changes)
- Process death (your agent run can vanish mid-flight)
- UI impatience (double taps and repeated intents)
- Energy constraints (infinite retries drain battery and data)
So your design must be:
- deadline-driven
- persisted
- idempotent
- budgeted (attempts, tokens, tools)
Reference Architecture: Mobile Agent Reliability Stack
![Reference Architecture]()
The key: idempotency is enforced at the boundary that performs the side effect (usually the backend), not just in the app.
1. Timeouts: Stop “Hanging Agents” and UI Retry Storms
Use two clocks.
A. Agent run deadline (global)
- “This agent run must finish within 10 seconds” (foreground)
- or “within 60 seconds” (background)
B. Tool call timeout (local)
- per tool (reads shorter, writes longer)
- includes network and server processing
Practical rules for mobile:
- Foreground agent run: 8–15 seconds total budget
- Per tool call: 1–6 seconds, depending on tool type
- Always support cancel (user taps cancel, app backgrounds)
If the user can’t tell whether anything is happening, they will trigger duplicates.
2. Retries: Not “Retry Everything,” but “Retry Safely”
Classify tools by side-effect risk.
Read-only (safe to retry)
- fetch recommendations
- retrieve invoice status
- get policy info
Write, idempotent (retry only with idempotency key)
- create invoice
- update profile
- create draft
Irreversible or expensive (never automatic)
- send money
- submit a charge
- send final message or email to a customer
- delete data
Important: retries should be driven by error category, not “any exception.”
Retry:
- network timeout
- DNS failure
- 502/503
- connection reset
Don’t retry:
- validation errors
- authentication failures
- permission denied
3. Idempotent Tool Calls: The Only Cure for Duplicate Side Effects
The rule
Every tool call that can mutate state must include an idempotency key.
Think of it as “exactly-once outcome,” even if delivery is at-least-once.
Key design (mobile-safe)
Make the idempotency key:
- unique per intended action (not per attempt)
- stable across retries and app restarts
A good format:
idempotencyKey = hash(sessionId + toolName + logicalActionId)
Where logicalActionId is something like:
invoiceDraft:customerId:amount:currency:timestampBucket- or a UUID created once and persisted
Persist before you call:
On mobile, generate the tool call record and persist it first, then execute.
If the app dies mid-flight, you can resume without creating a new action.
4. Persisted State Machine: Retries that Survive Process Death
Model your agent execution as a step state machine stored in Room or DataStore:
PENDINGRUNNINGSUCCEEDEDFAILED_RETRYABLEFAILED_TERMINALCANCELLED
And per tool call:
attemptCountlastErrorCategorynextRetryAtdeadlineAtidempotencyKey
This is how you stop “retrying from scratch” after a restart.
5) The Tool Gateway Pattern (Mobile ↔ Backend)
For anything that writes server state, prefer a single backend gateway that:
- validates typed arguments
- attaches auth context
- enforces idempotency
- records audit and trace data
- returns stable results for the same idempotency key
Backend contract (what you want)
If the mobile client sends the same (toolName, idempotencyKey) again:
- the backend returns the original result
- without re-executing side effects
This is the difference between “retries” and “duplicates.”
6. Kotlin-ish Sketch: Tool Call Envelope + Idempotency Header
data class ToolCall(
val sessionId: String,
val toolName: String,
val logicalActionId: String, // persisted once
val idempotencyKey: String, // derived from sessionId + tool + actionId
val attempt: Int,
val deadlineEpochMs: Long,
val argsJson: String
)
interface ToolClient {
suspend fun execute(call: ToolCall): ToolResult
}
// Example: attach idempotency key to your API request
// (header name varies by your backend convention)
fun buildRequest(call: ToolCall): okhttp3.Request =
okhttp3.Request.Builder()
.url("https://api.yourcompany.com/tools/${call.toolName}")
.addHeader("Idempotency-Key", call.idempotencyKey)
.addHeader("X-Session-Id", call.sessionId)
.post(okhttp3.RequestBody.create(null, call.argsJson))
.build()
On Android, pair this with:
- persistent storage (Room) for ToolCall and status
- a worker or executor (foreground coroutine or WorkManager for background resumption)
7) Mobile UX Guardrails that Prevent Duplicate Intents
Reliability is also UX:
- show a “Working…” state immediately
- show the current step (“Creating invoice…”, “Sending…”)
- disable duplicate primary actions while a session is active
- provide Cancel and Retry explicitly
- if the agent resumes after a restart, show “Resuming your request…”
This reduces user-triggered duplication before idempotency even has to save you.
8) What to Implement Checklist
Must-have
- Agent run deadline and per-tool timeout
- Tool classification: read / write-idempotent / irreversible
- Idempotency key required for all write tools
- Persist tool call record before execution
- Retry policy by error category (not blanket retries)
Should-have
- Tool gateway on the backend for idempotency and audit
- Step state machine in Room or DataStore
- Budget caps: max tool calls, max retries, max wall time
- Telemetry: attempt counts, error categories, idempotency hit rate
Nice-to-have
- Exactly-once “commit” workflows for irreversible actions
- Shadow-mode simulation tool calls for evaluation and testing
Closing
Mobile agents fail in production for the same reason mobile payments fail: side effects plus retries without idempotency.
If your agent can call tools, you’re building a distributed system.
Treat it like one: deadlines, budgets, persistence, and idempotent writes.
Opinions expressed by DZone contributors are their own.

Comments