Non-Human Identity Security in the Age of AI
The rise of AI in enterprises has expanded the attack surface. Learn how you can secure non-human identities and prevent unauthorized access.
Join the DZone community and get the full member experience.
Join For FreeIt is not a coincidence that non-human identities (NHIs) have come into focus recently while AI-powered tools and autonomous agents are rapidly being adopted. In fact, this is partially what is driving the explosion of NHIs in the enterprise. This has sparked a lot of research and conversations about machine identity and governance.
Like human users of systems, NHIs, such as AI agents, bots, scripts, and cloud workloads, operate using secrets. These credentials grant access to sensitive systems and data. They can take many forms and must always be accounted for, from creation to offboarding. Unlike humans, machines can't use multifactor authentication or passkeys, and developers can generate hundreds of these credentials through their work deploying applications.
The adoption rate for AI in the enterprise has been staggering, driving developers to roll out NHIs faster than ever. AI brings with it an opportunity to help us get a lot more done more efficiently, but it also brings some real risks around privacy, secrets exposure, and insecure code. There are some amazing use cases for LLMs, but we need to remember that, like any technology, the more you add to your environment, the larger the attack surface grows. This is especially true when we give agency to these AI agents.
The time to tackle NHI security in your increasingly AI-powered organization is now. Let's take a look at some of the risks associated with AI agents' NHIs.
The NHI Risks From AI
AI Agents and Secrets Sprawl
"AI agents" are LLM-based systems that decide independently how to accomplish a specified task. These are not the deterministic bots we have been using with many workflows for years, which can only perform the specific instructions the developer laid out step by step. These AI agents can access internal data sources to look up information, search the internet, and interact with other applications on behalf of the user.
For example, an AI-powered procurement agent could analyze purchasing needs, compare vendors through online shopping sites, negotiate prices with AI chatbots, and even autonomously place orders if allowed. Every secure communication requires access credentials. This new agent is produced through a DevOps process, requiring even more authentication across the pipeline. Credentials commonly, accidentally, get scattered across systems, logs, and repositories along the way.
It is very common to grant AI agents wider read, write, and even creation and deletion permissions than we would for deterministic bots. For traditional machine workers, we define what systems they can and can not access as part of the work we assign them to accomplish. Since AI agents are left to determine the best path for completing the job without direct supervision, we block the requested work if we scope our access too tightly. What read and write permissions will be required might not be clear from the beginning, and many teams are erroring on the side of being too permissive.
A leak of any one of the many keys involved could lead to a data breach or unauthorized purchases, among other risks. Strong non-human identity governance is essential to secure these AI agents. For all known and properly stored credentials in your vaults, enforce least privilege access, API key protection, and audit logging to prevent exploitation. Your strategy also needs to account for secrets that will undoubtedly be found outside of your vaults as well.
Orphaned API keys
An orphaned API key is an API key that is no longer associated with a user account. This happens when a user leaves a company or deletes their account. Any API keys they made stay valid, but now, no one owns them, and unless properly accounted for, they are likely to never get rotated or deleted.
In the world of machine identities, the question of who owns an NHI is a tricky one. Is it the person who created it? A dedicated DevOps team? Without clear ownership, the likelihood of a credential becoming orphaned and forgotten yet still allowing access is very high.
A better question would be, who owns the risk associated with a breach caused or aided by these API keys?
Prompt-Based Architecture and Sensitive Data Exposure
When we think of an AI assistant, we immediately think of players like ChatGPT, Gemini, and Claude, all of which use prompt-based architectures. So does GitHub Copilot. The AI models process, store, and transmit sensitive information through prompts, sending context, commands, and data to a large language model (LLM) provider. This approach makes these tools exceptionally easy to interact with, leading to rapid prototyping and tool development.
This is not isolated to your development teams. In fact, as we see shadow IT become the majority of the IT spend in many organizations, the real risks of data exposure, proprietary business data, and credential leakage spread throughout the whole of the enterprise.
For example, if a finance team uses an AI-powered chatbot to automate invoice processing, and their prompt contains, Find all invoices over $100,000 in the past 6 months using API key ABC123
, that API key will most likely now be logged. If those logs are left in plaintext, they would allow an attacker unauthorized access to that invoicing system. Hopefully, that key is properly scoped.
Safeguards need to be put in place to prevent developers and all users from embedding sensitive data in prompts and logs. Ideally, each LLM output can be scanned for information that should not be there. While defining what returned data is sensitive can be tricky, finding and eliminating secrets should be straightforward and prioritized.
AI Agents and Data Collection Risks
AI agents often ingest, process, and store data from various sources, including:
- Cloud storage such as AWS S3 and Google Drive
- Enterprise applications like Jira, Confluence, and Salesforce
- Messaging systems, including Slack and Microsoft Teams
We need to work to keep all sensitive information, such as credentials, PII, or other private data, out of these systems. If your AI agent can access any data in these systems, then the path for an attacker to abuse this NHI also exists.
The only sure way to eliminate this attack vector is to find and rotate any and all keys found throughout all internal systems around any AI agents. This includes version control systems, ticketing systems, and messaging platforms. Combined with good log sanitation, this can go a long way to keeping your secrets secret.
AI-Generated Code and Embedded Secrets
AI-powered development tools like GitHub Copilot, Amazon CodeWhisperer, and Tabnine have seen a rapid rate of adoption. Today, over 50% of developers use AI copilots to assist with coding. These tools auto-generate code snippets based on vast amounts of training data.
However, this introduces a major security risk, as AI-generated code may mislead a busy developer to hardcode secrets, such as API keys, database credentials, and even OAuth tokens.
For an example of an AI-generated risk, imagine a developer asks Copilot to generate an API call to a cloud service, and it produces:
import requests
API_KEY = "sk_live_ABC123XYZ"
response = requests.get("https://api.example.com/data", headers={"Authorization": f"Bearer {API_KEY}"})
This example code was produced by ChatGPT.
While that is not a real key, a developer under a time crunch or who is not familiar with secrets security might just swap a real key for the generated one. If left unchecked, such code may get committed to version control systems, exposing credentials to attackers who gain access or anyone if the repo becomes public.
This pattern is part of how secrets sprawl has continued to get worse over time. We need to help developers with guardrails that help speed them along while scanning all code before committing it, such as employing pre-commit hooks.
The Path Forward: Securing Non-Human Identities
The first step toward getting a handle on the NHIs in your organization is finding out which secrets you have. You need a way to automatically discover AI-agent credentials across enterprise environments, both inside and outside your vaults. If found outside of a vault, you need to properly move them to the right place.
Getting an accounting of these NHI credentials is the first step, but just finding a plaintext key does not tell you much about what is at risk. You still need to figure out where it goes, how it can be used, and what critical systems are accessible.
Understanding the secrets and mapping how they interconnect will give you the insights you need to understand what each secret connects to and help you understand what is at risk if exposure does occur. You also need to prevent sensitive data from being embedded in prompts or logs. Finding these incidents in real time can help you ensure sanitation is happening and logs are not as helpful to an attacker.
Getting Your NHI Governance Strategy Ready for the Speed of AI
There are no guarantees about what AI agents will ultimately deliver, but it is clear it is going to happen at a very accelerated pace. The complexity we are introducing now to deploy self-guiding AI agents to interact with other systems and agents brings with it a number of perils and a lot of promise for working more efficiently. We call on you to move forward securely as you add more NHIs faster than ever.
As AI adoption grows, securing machine identities is no longer optional — it’s essential.
Published at DZone with permission of Dwayne McDaniel. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments