Securing the Software Supply Chain in the Age of AI Agent Swarms

As AI agents accelerate software delivery, teams need automated trust controls, signed provenance, and runtime enforcement to keep releases fast and verifiable.

Mar. 23, 26 · Analysis

Likes (0)

Comment

Save

1.2K Views

If your team is using AI agents to write code, pick dependencies, or trigger builds, your delivery model is evolving fast, and your trust model needs to evolve with it. Agents are already improving quickly and can be a major force multiplier for engineering teams. But even with better models, trust decisions still need explicit controls: dependency validation, pipeline integrity checks, and artifact verification before runtime. At machine speed, those checks have to be automated, not optional.

I started paying attention to this after watching a team roll out an LLM-based coding assistant. Within weeks, their build logs had packages that were not in any approved registry. Nobody noticed because the builds were green. That experience convinced me that supply chain security is no longer something you bolt on later. It is part of how you ship software now.

What follows is the architecture I keep coming back to. It is not tied to any single vendor. It works on GitHub Actions, GitLab CI, Tekton, or anything else that supports OIDC. The core idea is simple: sign everything, attest everything, and block anything that cannot prove where it came from.

What Broke When Agents Entered the Picture

Before agents, CI/CD pipelines had a predictable trust model. A developer committed code, CI ran the build, and if something looked wrong, a human often caught it in review or staging. Agent-assisted workflows improve velocity, but they also change where errors can enter:

Dependency suggestion risk. An LLM can suggest a plausible but incorrect package name. If that name is registered by an attacker, the build may still pass unless dependency controls are in place.
Workflow change risk. Agent-generated updates to CI YAML can unintentionally remove a security step or introduce an untrusted action if review gates are weak.
Review lag at machine speed. By the time a human reviews the change, the artifact may already be built and promoted. Manual-only gates do not keep up with high automation.

The fix is not to slow agents down. It is to make verification automatic and mandatory.

The Attacks You Are Actually Designing Against

You do not need to model every possible attack. You need to cover the patterns that keep showing up:

Compromised build environment: Something bad gets injected during the build, but the output is still signed by your own infrastructure. The signature is valid. The artifact is not.
Tag or script swaps: You reference a tag, like v2.1 or a remote install script. Someone changes what that tag or URL points to. Your pipeline trusts it because the reference did not change.
Maintainer gone rogue: A trusted open-source contributor introduces malicious logic in build scripts or test fixtures. Months of clean contributions build the trust that makes this possible.
Hallucinated packages: An AI agent suggests a dependency name that does not exist yet. An attacker registers it and publishes malware. Your lockfile now includes it.

Every one of these gets through if your only check is "was it signed?" Signing tells you who signed it. It does not tell you how it was built, what went into it, or whether the build environment was clean.

Why SLSA Level 3 Is the Right Target

SLSA gives you a concrete framework to reason about this. Most teams should aim for Level 3. Here is what each level actually means in practice:

Level 1: You have a scripted build and generate some metadata about it. Better than nothing, but easy to fake.
Level 2: Your build runs on a hosted CI service, and the provenance is signed. Harder to tamper with, but the CI job itself could still be modified.
Level 3: The build runs in an isolated, ephemeral environment, and the provenance is generated by the build service, not by your code. You cannot forge it even if you control the repo.

Level 3 is the point where provenance actually means something in an incident. When someone asks, "How do we know this image is clean?", you can point to a cryptographic attestation that the build service produced independently.

How the Architecture Works

I think about this in two parts: what happens at build time and what happens at deploy time.

At Build Time

Your CI pipeline needs to produce three things alongside the artifact:

A cryptographic signature tied to a short-lived certificate from your CI platform's OIDC provider. No static keys stored in secrets. The certificate lives for about 10 minutes, signs the digest, and is gone. The signature stays valid because it is logged in a transparency log.
A provenance attestation that records the source repo, commit, workflow file, branch, and build type. This is what makes the signature meaningful.
A signed SBOM listing every dependency in the artifact. This is how you catch hallucinated packages after the fact and how you meet compliance requirements.

All three should be pushed to the same registry location as the image. If your signatures live in a separate store, verification breaks the moment that store is unreachable.

At Deploy Time

Your Kubernetes admission controller (or equivalent gate) checks every Pod before it runs:

Is the signer identity correct? The OIDC issuer and subject must match your approved CI workflow.
Does the provenance match policy? The build type, source repo, workflow file, and branch must all match what you expect.
Is the image pinned to a digest? If someone deploys by tag, the controller resolves it to a digest and persists that. Tags move. Digests do not.

If any check fails, the Pod is rejected. That is the whole point. Evidence at build time is useless if nothing enforces it at deploy time.

What Your Admission Policy Needs to Do

The specific policy engine does not matter much. Kyverno, Gatekeeper, native ValidatingAdmissionPolicy, or something custom. What matters is that the policy does these things:

Deny by default in any namespace that runs production workloads.
Allow only images whose signer identity matches an approved CI workflow.
Require a valid provenance attestation with the correct build type and source context.
Resolve tags to digests at admission time so the Pod spec always contains an immutable reference.
Provide an exception path for emergencies that is scoped, time-limited, and logged.

The most common mistake I see is teams that generate all the right evidence but never flip the policy from audit mode to enforce. Evidence you do not enforce is just overhead.

Handling Multi-Agent Pipelines

If you are running agent swarms where different agents handle different stages, a single attestation on the final artifact is not enough. Even when each agent is operating as designed, errors or unsafe outputs introduced early can propagate downstream. Final provenance alone confirms how the artifact was built, but not enough about upstream handoffs.

The better approach is to attest at each handoff:

The planning agent records what task it scoped and what constraints it set.
The coding agent records what files it generated and what model produced them.
The dependency agent records the resolved lockfile and which allowlist it checked against.
The build agent produces standard provenance and an SBOM.

At deploy time, admission can verify the full chain or just the stages that matter for a given workload. If the dependency stage attestation is missing or invalid, the deployment is blocked even if the build attestation looks fine.

Patterns That Work at Scale

Group policies by artifact type, not by service. Base images, internal services, and third-party containers have different risk profiles. Write one policy per category, not one per microservice.
Separate signer identities by environment. Your staging pipeline and production pipeline should not share the same signing identity. If staging is compromised, production stays protected.
Keep audit windows short. Run in audit mode for a week or two while you tune, then enforce. Teams that stay in audit mode forever never actually get the security benefit.
Treat policy changes like code. Store them in Git, require reviews, deploy through the same GitOps flow as everything else.
Build an emergency bypass that hurts. A break-glass path should exist, but it should be scoped to specific roles, expire automatically, and generate alerts. If it is easy to use, people will use it instead of fixing the real problem.

Mistakes I Keep Seeing

Deploying by tag instead of digest. You sign digest A. CI pushes a new build and moves the tag. Kubernetes pulls the tag, gets digest B, and the signature does not match. This is the number one "it was working yesterday" issue.
Signing everything but enforcing nothing. Teams set up Cosign, generate provenance, push attestations, and then never turn on the admission policy. All that work is wasted if nothing checks it.
Writing one policy per service. This does not scale past about 20 services. Use wildcards on the org or repo level and let repository access controls handle the rest.
No plan for air-gapped or private environments. If your signing flow depends on a public OIDC provider and a public transparency log, it breaks in regulated environments. Plan for private signing infrastructure or static key fallback early.

If you fix just one thing, fix the tag-vs-digest problem. It clears up more confusion than anything else.

What Is Coming

LLM-generated policies: Useful for first drafts, but do not auto-promote them. Treat generated policies the same as generated code: review before merge.
Post-quantum signing: NIST finalized its first post-quantum standards in 2024. The signing ecosystems are working on support. Pick tools that can rotate algorithms without redesigning your pipeline.
Built-in Kubernetes admission: ValidatingAdmissionPolicy with CEL is maturing fast. Expect less reliance on webhook-based engines over time, which means lower latency and fewer moving parts.

Wrapping Up

If you are shipping software in an environment where agents participate in the pipeline, supply chain security becomes core engineering hygiene. It is how you preserve trust while benefiting from automation.

The architecture is not complicated: keyless signing with short-lived certs, provenance that the build service generates independently, signed SBOMs, and an admission controller that rejects anything without valid evidence. If you get those four pieces working together, you move from "we think this image is fine" to "we can prove it." That is a meaningful difference when something goes wrong.

AI Software swarms

Opinions expressed by DZone contributors are their own.

Related

Trending