Securing Your Machine Identities Means Better Secrets Management

Machine identities make up the majority of the over 12.7 million secrets discovered in public on GitHub in 2023. Let's look at how we got here and how we fix this.

Dwayne McDaniel

Jul. 09, 24 · Analysis

Likes (1)

Comment

Save

3.2K Views

In 2024, GitGuardian released the State of Secrets Sprawl report. The findings speak for themselves; with over 12.7 million secrets detected in GitHub public repos, it is clear that hard-coded plaintext credentials are a serious problem. Worse yet, it is a growing problem, year over year, with 10 million found the previous year and 6 million found the year before that. These are not cumulative findings!

When we dig a little deeper into these numbers, one overwhelming fact springs out: specific secrets detected, the vast majority of which are API keys, outnumber generic secrets detected in our findings by a significant margin. This makes sense when you realize that API keys are used to authenticate specific services, devices, and workloads within our applications and pipelines to enable machine-to-machine communication. This is very much in line with research from CyberArk, machine identities outnumber human identities by a factor of 45 to one. This gap is only going to widen continually as we integrate more and more services in our codebases and with ever-increasing velocity.

Secrets sprawl is clearly a problem for both human and machine identities, so why should we call out this distinction?

Machine Identities

"Machine identities" is a term used to distinguish this area of secrets sprawl and its unique challenges apart from human identities and credentials. Each is problematic, but each calls for different approaches. We are following the naming convention from industry leaders in secrets management, such as CyberArk, and analyst firms who define the industry, such as Gartner, in standardizing this terminology. Gartner defines the term in their 2020 IAM Technologies Hype Cycle report as, "Simply put, a machine identity is a credential used by any endpoint (which could be an IoT device, a server, a container, or even a laptop) to establish its legitimacy on a network." This term covers all API access keys, certificates, Public key infrastructure (PKI), and any other way possible to authenticate machine-to-machine communication.

Is a Machine Identity the Same as a Non-Human Identity?

From a purely grammatical perspective, it must be a non-human identity if it is not a human identity. So why use the specific term machine identity? Well, practically speaking, a non-human could be a dog, a plant, or even a planet. When using the term "non-human" we must also necessarily further qualify what we mean, while the term 'machine identity' already has a widely accepted definition that narrows the scope to the secrets sprawl problem space.

For example, Venafi, a leading machine identity management platform, succinctly states, "The phrase “machine” often evokes images of a physical server or a tangible, robot-like device, but in the world of machine identity management, a machine can be anything that requires an identity to connect or communicate—from a physical device to a piece of code or even an API."

How Did We Get Here?

Before we can talk about what to do about the issues of machine identities and secrets sprawl, it might be helpful to take a historical look at how we arrived at this point in the industry. In the early days of computer science, the only "entities'"we had to worry about accessing our machines and our code were humans. In the days of ENIAC or early UNIX systems, using a simple password and perhaps sturdy locks on the doors were really all you needed to ensure only the proper people could access a system. People love passwords, and we have for thousands of years. The Roman garrison used "watchwords," which needed to be updated nightly, meaning we have been practicing manual password rotation for a couple of millennia now.

So, naturally, when it came time to implement machine-to-machine authentication, ensuring that we were only allowing access to trusted systems to recognize and communicate with one another, it was only natural we would turn to our old friend the password, in the form of a long and hard to guess token to get the job done. This system works okay until you remember the problem statement that started this article: we keep leaking these credentials into our code and into places around our code like Jira, Slack, and Confluence at an alarming rate.

Solving Both Human Identity and Machine Identity Sprawl

Now that we have a common vocabulary and understand the two areas of concern, human and machine, what are our next steps? Let's start with human identities. People need to be able to authenticate to gain access to systems to get their work done. Using phishing-resistant MFA, preferably hardware-based, at every juncture where a human uses a password is a solid approach. Even if a password is leaked, it is much harder to exploit and gives the user time to rotate the credential. While not a silver bullet, Microsoft believes this could stop up to 99.9% of fraudulent sign-ins. Even better, if there is a way to eliminate that password, such as with a passkey using FIDO2 or hardware-based biometrics for authentication, then we should probably move in that direction.

Dealing with machine resources requires a different approach, as we can't just turn on MFA for machines. We also can't disrupt these machine identities, as the business of the enterprise is to do business, and the connections must continue to allow our systems to function and satisfy the availability leg of the CIA Triad. Similarly, we can not devote endless resources and hours to this issue, as new vulnerabilities in the form of CVEs, misconfiguration, and licensing issues continue to be other areas security teams need to tackle.

In an ideal world, we could immediately move all of our systems to leverage short-lived certificates or JWTs that are issued at run time when needed and only live for the life of the request. Indeed, there are frameworks such as SPIFFE and its implementation, SPIRE, that can help organizations achieve this goal. While this is indeed a great approach, it has the real-world issues of developer adoption, development time and effort, and the overhead of running such services at scale.

While we can dream up many such ideal scenarios, we need to address the current situation head-on. Developers will continue to use machine identities, which can be leaked and exploited by attackers. At the same time, we know that if a malicious actor gets their hands on a secret, they can only leverage it if it is still valid. We believe the best practical solution for any organization is to rotate secrets much more frequently.

Automatically Rotating Secrets More Frequently

One of the other stand-out findings from our State of Secrets Sprawl Report was the fact that of all the valid secrets we discovered in public, over 90% were still valid five days later. We believe this points to the fact that teams expect secrets to be long-lived and that the current manual approach to secrets rotation is hard. Further evidence of these conclusions can be found in breach reports involving companies such as Cloudflare.

In this Secret Management Maturity Model white paper, a clear differentiator in organizations in the Advanced and Expert categories is that they have adopted regular credential rotation policies. It is very unlikely these mature organizations are doing manual rotation, as that would be an overwhelming, time-consuming, and error-prone process, which potentially could mean disaster in our interconnected architectures.

We need a way to automate the rotation process. The good news is that awesome tools are available, such as CyberArk's Conjure or AWS Secrets Manager, that make the process of auto-rotation pretty straightforward. Of course, this assumes all of your machine identities already and totally live within their system.

Auto-Rotation of Secrets First Means Knowing All Your Machine Identities

Now, we could ask for every developer and infrastructure owner to give security teams a list of all their credentials in plaintext for all their various workloads, services, and devices, but obviously, that is a terrible and highly problematic idea.

In all seriousness, what is needed is a scalable end-to-end solution that can help you systematically and automatically find all the plaintext credentials inside of your code base, leaked out onto GitHub publicly, or even found in the communication tools that surround your code.

Look for solutions that:

Gather all the data about a secret sprawl incident into a single logical unit
Are reachable by an API call or webhook, making it possible to interoperate with other systems
Can handle any volume of files to scan and can scan in multiple systems, both historically and in real-time
Offer developer tooling that helps prevent the issue in the first place

With such a tool in hand, you can find and then implement auto-rotation solutions.

Final Thought

No matter how you tackle the Machine Identity crisis in your organization, make sure you start sooner rather than later, as you will never have as few secrets in your environments as you do right this moment.

Machine Information security management Password manager

Published at DZone with permission of Dwayne McDaniel. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending