The New Insider Threat Isn't Human: Securing AI Agents Before They Secure Themselves
AI agents are becoming powerful insiders. Learn how identity, MCP security, least privilege, and policy enforcement reduce emerging risks.
Join the DZone community and get the full member experience.
Join For FreeIn mid-September 2025, engineers inside Anthropic's threat intelligence team noticed something that didn't fit the usual pattern of automated probing on their platform. Ten days of digging later, they had a name for it: GTG-1002, a Chinese state-sponsored group that had turned Claude Code into the operational core of a cyber-espionage campaign against roughly thirty organizations — banks, chemical manufacturers, tech firms, government agencies.
When Anthropic published its account of the intrusion on November 14, the detail that made security teams sit up wasn't the target list. It was the autonomy ratio: by the company's own estimate, the AI agent executed somewhere between 80 and 90 percent of the operation — reconnaissance, vulnerability discovery, exploit development, lateral movement, exfiltration — with humans stepping in only at a handful of strategic checkpoints. Jacob Klein, who heads threat intelligence at Anthropic, called it an escalation that lowers the bar for who can run a sophisticated intrusion at all.
I've spent the better part of this year watching that bar keep dropping, one disclosure at a time. And the thing I keep coming back to is this: the security industry built thirty years of tooling around the assumption that the dangerous actor inside your network is a person — a careless employee, a disgruntled admin, a phished contractor. That assumption is now wrong often enough to be a liability. The dangerous actor increasingly has no payroll record, no badge, no manager to flag erratic behavior. It's a process. And it's already inside.
Skeleton Keys for Software
Here's the uncomfortable arithmetic. CyberArk's 2025 Identity Security Landscape study found machine identities now outnumber human ones by more than 80 to 1 inside the average enterprise, with AI specifically named as the biggest driver of new privileged accounts this year. Other measurements land in a wide band — Rubrik Zero Labs put it at 82 to 1, Entro Labs measured DevOps-heavy environments at 144 to 1 — but every credible estimate points in the same direction, and the gap is widening faster than anyone's governance program.
What makes this dangerous isn't the count. It's the habit. Most teams I've talked with over the past eighteen months reached for the path of least resistance when they first wired an agent into production: they handed it a copy of a human's API key, or a service account with the same standing privileges everyone else in that pipeline already had. It's the software equivalent of cutting a spare house key and leaving it under the mat — convenient until the day someone you didn't intend to find it.
That convenience is exactly what blew up Salesloft and its customers in August 2025. Attackers tracked as UNC6395 didn't breach Salesforce. They stole OAuth tokens belonging to Drift, a chatbot integration plugged into it, and used those long-lived, broadly scoped tokens to walk into Salesforce, Slack, AWS, and Google Workspace environments at more than 700 downstream organizations — Cloudflare and Google among them — over roughly a ten-day window. Nobody compromised the platform. They compromised the credential that the integration was trusted with, and that credential opened far more doors than the integration's actual job required. Swap "chatbot integration" for "AI agent," and you've described the exact failure mode every analyst is now warning about for 2026.
The fix that keeps surfacing in serious architecture conversations isn't exotic — it's the same zero-trust logic that's been preached at humans for a decade, finally pointed at software:
| Skeleton-key model | Scoped-identity model | |
|---|---|---|
| Credential | Copied human API key or shared service account | Unique identity per agent, issued via OAuth client credentials or a workload-identity standard like SPIFFE |
| Lifetime | Static, often unrotated for months or years | Short-lived, reissued per session or task |
| Blast radius if stolen | Everything that account can touch | Only what that specific agent was scoped to do |
| Auditability | "Someone" did this | This agent, acting on this task, did this |
None of this is theoretical anymore. Gartner is telling boards that by 2028, roughly a third of enterprise applications will carry embedded agentic AI, and 15 percent of day-to-day work decisions will be made without a human in the loop. You cannot run that volume of autonomous action on credentials designed for an employee who logs in, does a job, and logs out.
When the Prompt Is the Payload
If identity is the slower-burning problem, prompt injection is the one that's already setting things on fire. OWASP's 2025 Top 10 for LLM Applications kept it at the number-one slot for a second consecutive edition, and for good reason: an LLM has no architectural separation between "instructions I should obey" and "data I should merely read." Feed it both in the same channel, and a sufficiently clever attacker can make the model treat the second as the first.
The cleanest public demonstration of how bad this gets in practice is CamoLeak, the vulnerability researcher Omer Mayraz disclosed through Legit Security in October 2025, tracked as CVE-2025-59145 with a CVSS score of 9.6. The setup was almost playful: hide an instruction inside a pull request's invisible comment field, wait for a developer to ask GitHub Copilot Chat to review that PR, and let Copilot — operating with that developer's own repository privileges — quietly search the codebase for strings like "AWS_KEY," then exfiltrate whatever it found one character at a time. Each character got mapped to its own GitHub-hosted image URL, routed through GitHub's own trusted Camo proxy so the outbound traffic looked like nothing more than a chat window rendering a picture. Legit Security's CTO, Liav Caspi, put the core problem plainly: a vigilant network monitor might catch the unusual request pattern, but the average user or maintainer almost certainly wouldn't. GitHub closed the hole in August by disabling image rendering in Copilot Chat entirely — a blunt fix, but an honest acknowledgment that there was no elegant patch for the underlying design flaw.
What should worry you is that CamoLeak is GitHub-specific plumbing wrapped around a generic problem. Any agent that reads untrusted content and can also take action — summarize an inbox, browse a webpage, query a ticketing system — has the same exposed nerve. The attack surface isn't the code. It's the fact that the model can't reliably tell an instruction from a sentence describing one.
MCP Didn't Invent the Confused Deputy. It Industrialized It.
The Model Context Protocol turned eighteen months old this past spring, and in agent circles it's already being described, only half-jokingly, as the USB-C of AI tooling — a single standard that lets an agent plug into dozens of databases, SaaS platforms, and internal systems without custom integration code for each one. That convenience is precisely why it became 2025's most interesting new attack surface. CVE-2025-49596 let attackers run arbitrary commands through unauthenticated MCP Inspector instances, rated 9.4. CVE-2025-6514, found in the widely used mcp-remote project, hit 9.6 and gave attackers OS-level command execution simply by getting an MCP client to connect to a malicious server. Researchers at Invariant Labs separately showed they could pull private repository data and WhatsApp message history out through MCP integrations that trusted server-supplied tool descriptions a little too much.
That last detail is the one practitioners now call tool poisoning, and it deserves more attention than it gets. An MCP server doesn't just expose a function — it ships a natural-language description of that function for the model to read. Bury a hidden instruction inside that description, and the agent absorbs it as context with the same credulity it would extend to legitimate documentation. Layer in what researchers call a rug pull — a tool that behaved safely last week, silently swapping in malicious behavior this week, with no re-approval prompt — and you've got a supply chain risk that traditional dependency scanning has no vocabulary for.
Underneath all of it sits the same architectural sin the original insider-threat literature has been naming for years: authorization quietly divorcing from authentication. An MCP server executing a database query on an agent's behalf needs to know not just that the agent is who it claims to be, but what the human or task behind that request was actually authorized to do. Skip that check, and you've built a confused deputy that will dutifully escalate its own privileges on a stranger's behalf.
Where the Policy Engine Has to Live
The architecture pattern that's converging across the vendors and practitioners I trust most isn't subtle, and that's its strength. You insert a policy decision point — Cerbos, Open Policy Agent, or an equivalent — directly in the path between the agent's tool calls and the systems those calls touch, so that nothing executes on trust alone:
User
|
v
AI Agent ----(declares identity + intent)----> Policy Engine (PDP)
^ |
| allow? | deny?
| v
| MCP Server -----> Database / API
| |
+---------------------(action result)----------+
The point of that middle box is to ask a boring, specific question on every single call: which agent is this, what was it actually asked to do, and does this particular action fall inside that scope? "Only SalesBot may call lookup_customer." "Any transfer above a threshold requires a human approval step before the MCP server executes it." None of that logic lives in the model's good judgment, because the model's judgment is exactly what prompt injection is designed to corrupt. The enforcement has to sit somewhere a crafted sentence can't reach it.
This is also, not coincidentally, where the Cloud Security Alliance's "toxic cloud trilogy" — a public workload, a real vulnerability, and standing high-level privilege, all present at once — actually gets defused. CSA's own telemetry shows that the combination is present in 38 percent of workloads in early 2024, down to 29 percent by mid-2025, as organizations started pulling standing privilege out of the equation. That's real progress. It's also nowhere near fast enough for the rate at which agents are being deployed.
What 2026 Actually Requires
I don't think the next twelve months are going to be defined by a single dramatic breach, although there will probably be one anyway. I think they'll be defined by something quieter and more structural: the slow, overdue migration of agents off static, shared credentials and onto something closer to what SPIFFE and SPIRE were originally built for in the service-mesh world — short-lived, cryptographically verifiable, per-workload identity that can be issued, scoped, and revoked without anyone touching a spreadsheet of API keys. OWASP published a dedicated Non-Human Identity Top 10 in 2025 for exactly this reason; the existing application-security and human-IAM playbooks simply don't have entries for credentials that never sleep, never request access, and inherit whatever standing permission happens to be sitting there.
The governance gap is still wide open. Recent industry surveys put the share of organizations with mature agent-governance programs below one in five, even as more than ninety percent of security leaders rate the problem as critical. That mismatch — high anxiety, low operational maturity — is usually the exact condition under which the expensive breach happens. My honest read, after a year of watching this space accelerate: the organizations that treat their agents as first-class, individually identified, least-privileged principals from day one will look unremarkable in hindsight. The ones that didn't will be writing the incident reports everyone else cites in 2027.
Opinions expressed by DZone contributors are their own.
Comments