Hardcoded secrets are an example of a sensitive data leak. Sensitive data leaks happen when an application exposes sensitive data, such as credentials, secret keys, personal information, or configuration information, to people who shouldn’t have access to that information.
For instance, if an application writes sensitive personal information like customers’ credit card numbers into application logs, that information becomes accessible to system analysts who can read logs. It’s also common for applications to leak users' private information in the source code of profile pages. Now, how do you determine if your application is at risk? How should you discover these sensitive data leaks?
Finding Data Leaks That Matter
Utilizing regex and entropy to scan for secrets is an effective first step to identifying potential data leaks. But to spot the ones that can actually lead to compromise is a more complex issue.
For one, not all code is open-sourced, and some hardcoded secrets may not be at risk of being leaked to the public at all. Some sensitive data are obfuscated before attackers can get their hands on them. To understand which pieces of sensitive data would actually cause you problems, you’ll need to understand how that piece of sensitive data could be leaked.
Sources and Sinks
In code analysis speak, a “source” is the code that allows a vulnerability to happen. Whereas a “sink” is where the vulnerability actually happens. Take command injection vulnerabilities, for example. A “source” in this case could be a function that takes in user input. Whereas the “sink” would be functions that execute system commands. If the untrusted user input can get from “source” to “sink” without proper sanitization or validation, there is a command injection vulnerability. Many common vulnerabilities can be identified by tracking this “data flow” from appropriate sources to corresponding sinks.
Sensitive data leaks can be identified this way too. The “source” of a sensitive data leak is usually a variable containing sensitive information or any functionality that uses the variable. And a “sink ” in this context could be any function that causes information to be displayed to users, such as logging, sending automated emails, and writing to web pages. If the sensitive “source” can make its way to a “sink” function, the sensitive data could be leaked.
Tracking the flow of sensitive data to determine if they can reach dangerous sinks is a very efficient way of determining the actual risks of a data leak. For instance, we can track if a sensitive literal, such as a secret key, can reach an untrusted sink.