ARC: The Architecture for Reasoning Control

AI apps fail from compounding randomness. Start small, add layered guardrails, and use AI for reasoning but code for execution to keep systems reliable.

Ananth Iyer

May. 06, 26 · Analysis

Likes (0)

Comment

Save

2.2K Views

Three Lessons from an AI Makeathon

I recently participated in a makeathon focused on building AI-powered applications. Over 2–3 intense days, I watched teams go from idea to demo — and the patterns that separated working products from frustrated debugging sessions were remarkably consistent, especially for teams building AI agents.

From this makeathon and from my experience working with teams building AI applications and agents, here are the three lessons I took away on how to build reliable AI applications by engineering around non-determinism. Together, these form what I would like to call “The Architecture for Reasoning Control”.

1. Start Small — Non-Determinism Compounds

AI models are non-deterministic. The same input won’t always produce the same output. This is a feature when you want creativity. It’s a problem when you want reliability.

In a small app — one model call, one task — non-determinism is manageable, you can observe this behavior, tune your prompts, and build confidence. You iterate fast and catch drift early.

In a large app like an AI Agent where the model must reason, select tool, and manage state across multiple steps, these non-determinism errors compound. Every AI call is a roll of the dice. Chain ten of them together and you’re rolling ten dice simultaneously. The probability of a successful end-to-end run — P(success)^n — decays exponentially. The probability of at least one undesired result doesn’t just grow — it compounds quickly.

In my experience building bigger AI agents, we often spend the majority of our time chasing unpredictable outputs across these long chains. By scoping small, we found we could build working demos and deployable applications that actually stay on the rails.

The Architectural Lesson: Apply the Single Responsibility Principle (SRP) of Architecture design: An AI module should have one, and only one, reason to change. You can think of these as analogous to microservices — small, single-purpose AI units that can be composed safely. Get one agentic interaction working with high reliability before you dream of chaining it. If the foundation is shaky, the agentic skyscraper will fall.

2. Multipass Guardrails — Defense in Depth

Even the best guardrails we built didn’t have 100% effectiveness. A single validation pass catches most bad outputs — but “most” isn’t enough when you’re shipping to users.

To understand why, consider the full surface area you need to guard. Most teams think about content safety — blocking violent or illegal content. But that’s just one of six categories:

To get more determinism in our guardrail efficacy and build true Defense-in-Depth, we experimented with a “double-pass” approach — running the same guardrail logic against both input and output. While this bumped our success rate slightly, it quickly revealed a structural flaw: correlated blind spots. When our detection logic misclassified an illegal query as merely “off-topic” at the input stage, it consistently made the same error at the output stage. Similarly, PII that bypassed the upstream filter sailed through downstream because the detection signature was identical. We realized that while doubling down on the same logic slightly increased our safety margin — it just mirrored our existing weaknesses.

So we researched shifting from symmetrical filtering to a model built on orthogonal, independent layers. The goal was to ensure that if one layer failed, the next would approach the problem from a completely different technical angle. This “cops-and-robbers” dynamic makes it significantly less likely that failures align — requiring multiple, differently designed systems to fail simultaneously for an issue to reach the user.

If you’re looking to move beyond simple “pass/fail” filters, here are the layers you could analyze to stack with your guardrail:

Dedicated Scanners (NER & Regex): Use deterministic PII scanners (regex for SSNs/credit cards) and Named Entity Recognition (NER) to catch data leaks before the query even hits the model.
Intent Routing: Use a fast, specialized classifier to bucket queries into “benign,” “ambiguous,” or “high-risk.” This allows you to route high-risk queries through stricter handling paths or specialized system prompts before they reach the primary generative model.
Structural Enforcement (JSON Schema): Move the goalposts from “free-text” to “data validation.” By forcing the model to output in a strict JSON schema, you turn unpredictable “Model Behavior” risks into a predictable code problem that can be caught by a standard parser.
LLM-as-a-Judge: Introduce a secondary, smaller “observer” model tasked purely with evaluating the primary model’s response against a different set of criteria.
Retrieval-grounded responses (RAG)
Constraining the model to answer only from retrieved context and validating that outputs are traceable to sources — reducing hallucination and unsupported claims.
Confidence / uncertainty gating
Using signals (judge scores, validation checks, or model uncertainty) to decide when to answer, ask for clarification, or fall back — rather than treating all outputs equally.

The overarching lesson was that there is no such thing as a “perfect” guardrail. Instead, assemble a stack of diverse, independent checks. By assuming that every individual layer will occasionally fail, you can design a system where those failures never align — creating a robust “Swiss Cheese” model of AI safety that actually holds up under adversarial pressure.

3. Flow Engineering — Mix AI with Deterministic Processing: Control What You Can

AI excels at ambiguity: reasoning over messy inputs, interpreting intent, and generating natural language. But for problems requiring guaranteed correctness — precise data lookups, workflow sequencing, or state management — it remains fundamentally probabilistic. It can often arrive at the right answer, but it cannot reliably guarantee it every time.

The insight that worked best: Use AI for reasoning; use deterministic code for execution.

Let AI decide what to do (intent, analysis, extraction). Then let code decide how to do it (orchestration, API calls, state management). This separation doesn’t just improve reliability — it fundamentally changes how the system behaves:

Controlled Scope: By limiting LLM calls to only the steps that require reasoning, you reduce unnecessary model invocations and keep the AI surface area small. This reinforces Lesson 1 — when the scope is smaller, the non-determinism is easier to observe.
Targeted Safety: It strengthens Lesson 2 — guardrails are most effective when applied to fewer, well-defined points rather than across an unbounded flow.

This is the “agentic pattern” emerging across the industry: a deterministic workflow engine that delegates to AI only where human-like reasoning is needed, then pulls the result back into controlled, predictable code. The best AI applications aren’t the ones that give AI the most freedom — they’re the ones that give AI the right freedom.

This is the core of Flow Engineering. Instead of letting an agent navigate a dark room, we hard-coded the rails. By using the LLM as a cognitive engine at specific steps in a verifiable chain — rather than a free-roaming driver — we replaced a porous process with a solid structural track.

Why This Works

Reliability: Deterministic systems eliminate randomness in mission-critical steps.
Cost & Latency: Fewer LLM calls lead to lower inference costs and faster responses.
Observability: A smaller AI surface area is easier to monitor, test, and debug.
Safety: Guardrails become exponentially more effective when applied at controlled, well-defined points.

You’re not just optimizing performance — you’re containing non-determinism.

Exec Insight: High-risk business logic should stay deterministic; creative and reasoning tasks can be probabilistic.

Conclusion

All three lessons point to the same principle: respect the non-determinism. The goal isn’t to eliminate non-determinism. It’s to build systems where it can’t break you by using “ARC: The Architecture for Reasoning Control”. AI systems don’t fail because they’re non-deterministic.
They fail because that non-determinism is poorly bounded.

Don’t fight it. Don’t ignore it. Don’t pretend your model is a function that returns the same output every time. The teams that built the most impressive demos at the makeathon weren’t the ones with the most ambitious prompts. They were the ones who understood where AI helps — and where it doesn’t.

Summarizing using a Swiss Cheese metaphor:

Lesson 1 (Start Small): Shrinking the size of the “holes” in the cheese by limiting scope using Architectural principle of SRP.
Lesson 2 (Orthogonal Defense-in-Depth): Stacking the slices so the “holes” never align through orthogonal layers.
Lesson 3 (Flow Engineering): Reduce how much is cheese in the first place in the system by using Deterministic Flows for critical logic.

While our team was recognized with a special award, the real takeaway was the framework we discovered along the way.

Start small. Guard deep. Stay deterministic.

That’s what turns AI from a demo into a system you can trust.

AI Cheese (software) Swiss cheese (mathematics)

Opinions expressed by DZone contributors are their own.

Related

Trending