Rethinking the Software Supply Chain for Agents
Today’s CI/CD pipelines aren’t built for AI. To make agentic systems reliable and trustworthy, we must evolve from continuous integration to continuous intelligence.
Join the DZone community and get the full member experience.
Join For FreeA recent MIT study reported that only about 5% of GenAI applications are creating real, measurable business value. In my opinion, that’s not a failure of ambition. If anything, most teams are experimenting aggressively. The issue is that the underlying systems we use to deliver software haven’t adapted to what AI actually is.
It has become incredibly easy to build a prototype or demo. A few prompt tweaks, an API call, and you can show something impressive. But turning that prototype into something you can trust in production is a different challenge. That part requires real engineering: reliability, consistency, versioning, monitoring, and guardrails. The problem is that the tools and workflows we’ve relied on for years were never designed to support systems that change their behavior over time.
Continuous integration or continuous delivery (CI/CD) pipelines were built to test code. They answer questions like:
- Does this function return the expected result?
- Does the application deploy cleanly?
But AI systems and agents don’t behave like static code. Their behavior can vary based on context, data, and prompts. So the real question becomes: Can this system make decisions we trust, even when conditions shift?
In this article, I will traverse some of those thoughts that will help us build software supply chain schemes that will make us successful.
Rethink the Software Supply Chain
We need to accept that the way we’ve built and delivered software for years no longer fits the world of intelligent, agentic use-cases.
Today, CI/CD pipelines focus on checking if the code works, but with agentic systems, the focus is to understand how agents behave, adapt, and make decisions in complex environments.
To make that possible, our software supply chain needs to move beyond continuous integration and evolve toward delivering continuous intelligence about the software being built and delivered. In the new pipeline, every step needs to ensure we are learning and evaluating the agent to have more trust.
We need to see the software supply chain as something alive that constantly learns, improves, and evolves with the products it supports.
Evolve From Continuous Integration to Continuous Intelligence
Our goal for CI in the agentic era has changed.
- From: “Is this code good enough to merge?”
- To: “Can this intelligence be trusted to act reliably?”
The move from continuous integration to continuous intelligence changes how we think about building and trusting software. Software delivery pipelines were designed to check if code runs correctly, but agentic systems require us to validate how the system behaves in real-world, unpredictable conditions.
LLMs are non-deterministic, where the results are unpredictable, but when we are evolving the software, we need to ensure its behavior continues to improve. So the challenge is to ensure reliability and consistency for software that is based on non-determinism.
As a software creation community, we need to evolve our continuous integration pipelines to provide us with reliability and help us confirm that the software remains reliable over time.
EVALS Are the New Unit Tests
In the world of agentic systems, we need to think of evals in ways we think about unit tests for non-agentic software. They go beyond checking if something works to measuring how well it performs in terms of capability, reliability, and safety. Evals help determine not just if a model produces the right output, but whether it behaves consistently and can be trusted in real scenarios.
They can run offline, online, or continuously online, providing ongoing feedback about system behavior. In essence, evals bring together automated testing and runtime observability to create an ongoing loop of assessment and improvement for intelligent systems.
Integrate EVALS Into the Delivery Chain
Integrating evals into the delivery chain ensures that agentic software is continuously validated throughout the software lifecycle.
In the CI stage, offline evals verify core thresholds before code moves forward. During CD, progressive delivery is guided by eval scores that indicate performance and reliability in real scenarios.
Once deployed, always-on evals run in production to monitor issues such as model drift, bias, toxicity, and safety. By combining these layers, teams can make informed promotion or rollback decisions based on aggregate eval results, creating a delivery pipeline that learns, adapts, and maintains trust in every release.

Treat Your Supply Chain as a Living System
In an agentic world, the software delivery process must evolve into a continuous feedback loop that behaves like a living system. Real user signals feed directly into inline evaluations, triggering reflections and automated improvement actions. When drift or performance degradation is detected, prompts and workflows need to be re-evaluated, and agents need continuous redeployments, ensuring systems stay aligned and reliable.
Over time, the system learns across data, prompts, and behavioral patterns, adapting to continuously deliver improving value. CI/CD is no longer a straight path from code to production. It becomes a healing loop where every interaction contributes to ongoing learning and refinement.
Conclusion
While we are still just getting started as a community, I am convinced that the future of software delivery depends on shifting our focus from code correctness to behavioral trust. The current CI/CD pipelines were designed for deterministic systems, but agentic and AI-driven applications demand new approaches built around continuous learning and assurance. Evals now serve as the new unit tests, helping teams measure reliability, performance, and safety at every stage of deployment.
By evolving the supply chain into a feedback system, organizations can create pipelines that not only deliver faster but also learn and adapt alongside their agents, ensuring every release is both intelligent and trustworthy.
Reference
Opinions expressed by DZone contributors are their own.
Comments