Hacking and Securing Python Applications
The first step to fixing vulnerabilities is to know what to look for. Here are 27 of the most common ones that affect Python apps and how you can find and prevent them.
Join the DZone community and get the full member experience.Join For Free
Securing applications is not the easiest thing to do. An application has many components: server-side logic, client-side logic, data storage, data transportation, API, and more. With all these components to secure, building a secure application can seem really daunting.
Thankfully, most real-life vulnerabilities share the same root causes. And by studying these common vulnerability types, why they happen, and how to spot them, you can learn to prevent them and secure your application.
The use of every language, framework, or environment exposes the application to a unique set of vulnerabilities. The first step to fixing vulnerabilities in your application is to know what to look for. Today, let’s take a look at 27 of the most common vulnerabilities that affect Python applications, and how you can find and prevent them.
Let’s secure your Python application! The vulnerabilities I will cover in this post are:
- XML external entity attacks (XXE)
- Insecure deserialization
- Remote code execution (RCE)
- SQL injection
- NoSQL injection
- LDAP injection
- Log injection
- Mail injection
- Template injection (SSTI)
- Regex injection
- XPath injection
- Header injection
- Session injection and insecure cookies
- Host header poisoning
- Sensitive data leaks or information leaks
- Authentication bypass
- Improper access control
- Directory traversal or path traversal
- Arbitrary file writes
- Denial of service attacks (DoS)
- Encryption vulnerabilities
- Insecure TLS configuration and improper certificate validation
- Mass assignment
- Open redirects
- Cross-site request forgery (CSRF)
- Server-side request forgery (SSRF)
- Trust boundary violations
XML External Entity Attacks
XML external entity attacks, or XXE, are when attackers exploit an XML parser to read arbitrary files on your server. Using an XXE, attackers might also be able to retrieve user information, configuration files, or other sensitive information like AWS credentials. To prevent XXE attacks, you need to explicitly disable these functionalities.
Serialization is a process during which an object in a programming language (say, a Python object) is converted into a format that can be saved to the database or transferred over a network. Whereas deserialization refers to the opposite: it’s when the serialized object is read from a file or the network and converted back into an object. Many programming languages support the serialization and deserialization of objects, including Java, PHP, Python, and Ruby.
Insecure deserialization is a type of vulnerability that arises when an attacker can manipulate the serialized object and cause unintended consequences in the program’s flow. Insecure deserialization bugs are often very critical vulnerabilities: an insecure deserialization bug will often result in authentication bypass, denial of service, or even arbitrary code execution.
To prevent insecure deserialization, you need to first keep an eye out for patches and keep dependencies up to date. Many insecure deserialization vulnerabilities are introduced via dependencies, so make sure that your third-party code is secure. It also helps to avoid using serialized objects and utilize simple data types instead, like strings and arrays.
Remote Code Execution
Remote code execution vulnerabilities, or RCE, are a class of vulnerabilities that happen when attackers can execute their code on your machine. One of the ways this can happen is through command injection vulnerabilities. They are a type of remote code execution that happens when user input is concatenated directly into a system command. The application cannot distinguish between where the user input is and where the system command is, so the application executes the user input as code. The attacker will be able to execute arbitrary commands on the machine.
One of the easiest ways to prevent command injection is to implement robust input validation in the form of an allowlist.
Command injection is also a type of injection issue. Injection happens when an application cannot properly distinguish between untrusted user data and code. When injection happens in system OS commands, it leads to command injection. But injection vulnerabilities manifest in other ways, too.
In an SQL injection attack, for example, the attacker injects data to manipulate SQL commands. When the application does not validate user input properly, attackers can insert characters special to the SQL language to mess with the query’s logic, thereby executing arbitrary SQL code.
SQL injections allow attacker code to change the structure of your application’s SQL queries to steal data, modify data, or potentially execute arbitrary commands in the underlying operating system. The best way to prevent SQL injections is to use parameterized statements, which makes SQL injection virtually impossible.
Databases don’t always use SQL. NoSQL databases (or not only SQL databases) are those that don’t use the SQL language. NoSQL injection refers to attacks that inject data into the logic of these database languages. NoSQL injections can be just as serious as SQL injections; they can lead to authentication bypass and remote code execution.
Modern NoSQL databases, such as MongoDB, Couchbase, Cassandra, and HBase, are all vulnerable to injection attacks. NoSQL query syntax is database-specific, and queries are often written in the programming language of the application. For the same reason, methods of preventing NoSQL injection in each database are also database-specific.
The Lightweight Directory Access Protocol (LDAP) is a way of querying a directory service about the system’s users and devices. For instance, it’s used to query Microsoft’s Active Directory. When an application uses untrusted input in LDAP queries, attackers can submit crafted inputs that cause malicious effects. Using LDAP injection, attackers can bypass authentication and mess with the data stored in the directory. You can use parameterized queries to prevent LDAP injection.
You probably conduct system logging to monitor for malicious activities going on in your network. But have you ever considered that your log file entries could be lying to you? Log files, like other system files, could be tampered with by malicious actors. Attackers often modify log files to cover up their tracks during an attack. Log injection is one of the ways attackers can change your log files. It happens when the attacker tricks the application into writing fake entries in your log files.
Log injection often happens when the application does not sanitize newline characters
\n in input written to logs. Attackers can make use of the new line character to insert new entries into application logs. Another way attackers can exploit user input in logs is that they can inject malicious HTML into log entries to attempt to trigger an XSS on the browser of the admin who views the logs.
To prevent log injection attacks, you need a way to distinguish between real log entries and fake log entries injected by the attacker. One way to do this is by prefixing each log entry with extra meta-data like a timestamp, process ID, and hostname. You should also treat the contents of log files as untrusted input and validate them before accessing or operating on them.
Many web applications send emails to users based on their actions. For instance, if you subscribed to a feed on a news outlet, the website might send you a confirmation with the name of the feed.
Mail injection happens when the application employs user input to determine which addresses to send emails to. This can allow spammers to use your server to send bulk emails to users or enable scammers to conduct social engineering campaigns via your email address.
Template engines are a type of software used to determine the appearance of a web page. These web templates, written in template languages such as Jinja, provide developers with a way to specify how a page should be rendered by combining application data with web templates. Together, web templates and template engines allow developers to separate server-side application logic from client-side presentation code during web development.
Template injection refers to injection into web templates. Depending on the permissions of the compromised application, attackers might be able to use the template injection vulnerability to read sensitive files, execute code, or escalate their privileges on the system.
A regular expression, or regex, is a special string that describes a search pattern in text. Sometimes, applications let users provide their own regex patterns for the server to execute or build a regex with user input. A regex injection attack, or a regular expression denial of service attack (ReDoS), happens when an attacker provides a regex engine with a pattern that takes a long time to evaluate.
Thankfully, regex injection can be reliably prevented by not generating regex patterns from user input, and by constructing well-designed regex patterns whose required computing time does not grow exponentially as the text string grows.
XPath is a query language used for XML documents. Think SQL for XML. XPath is used to query and perform operations on data stored in XML documents. For example, XPath can be used to retrieve salary information of employees stored in an XML document. It can also be used to perform numeric operations or comparisons on that data.
XPath injection is an attack that injects into XPath expressions in order to alter the outcome of the query. Like SQL injection, it can be used to bypass business logic, escalate user privilege, and leak sensitive data. Since applications often use XML to communicate sensitive data across systems and web services, these are the places that are the most vulnerable to XPath injections. Similar to SQL injection, you can prevent XPath injection by using parameterized queries.
Header injection happens when HTTP response headers are dynamically constructed from untrusted input. Depending on which response header the vulnerability affects, header injection can lead to cross-site scripting, open redirect, and session fixation.
For instance, if the
Location header can be controlled by a URL parameter, attackers can cause an open redirect by specifying their malicious site in the parameter. Attackers might even be able to execute malicious scripts on the victim’s browser, or force victims to download malware by sending completely controlled HTTP responses to the victim via header injection.
You can prevent header injections by avoiding writing user input into response headers, stripping new-line characters from user input (newline characters are used to create new HTTP response headers), and using an allowlist to validate header values.
Session Injection and Insecure Cookies
Session injection is a type of header injection. If an attacker can manipulate the contents of their session cookie, or steal someone else’s cookies, they can trick the application into thinking that they are someone else. There are three main ways that an attacker can obtain someone else’s session: session hijacking, session tampering, and session spoofing.
Session hijacking refers to the attacker stealing someone else's session cookie and using it as their own. Attackers often steal session cookies with XSS or MITM (man-in-the-middle) attacks. Session tampering refers to when attackers can change their session cookie to change how the server interprets their identity. This happens when the session state is communicated in the cookie and the cookie is not properly signed or encrypted. Finally, attackers can “spoof” sessions when session IDs are predictable. If that’s the case, attackers can forge valid session cookies and log in as someone else. Preventing these session management pitfalls requires multiple layers of defense.
Host Header Poisoning
Web servers often host multiple different websites on the same IP address. After an HTTP request arrives at an IP address, the server will forward the request to the host specified in the host header. Although host headers are typically set by a user’s browser, it’s still user-provided input and thus should not be trusted.
If a web application does not validate the host header before using it to construct addresses, attackers can launch a range of attacks, like XSS, server-side request forgery (SSRF), and web cache poisoning attacks via the host header. For instance, if the application uses the host header to determine the location of scripts, the attacker could submit a malicious host header to make the application execute a malicious script:
String scriptURL = "https://" + properties.getProperty("host") + "/script.js";
Sensitive Data Leaks
Sensitive data leak occurs when an application fails to properly protect sensitive information, giving users access to information they shouldn’t have available to them. This sensitive information can include technical details that aid an attack, like software version numbers, internal IP addresses, sensitive filenames, and file paths. It could also include source code that allows attackers to conduct a source code review on the application. Sometimes, the application leaks private information of users, such as their bank account numbers, email addresses, and mailing addresses.
Some common ways that an application can leak sensitive technical details are through descriptive response headers, descriptive error messages with stack traces or database error messages, open directory listings on the system’s file system, and revealing comments in HTML and template files.
Authentication refers to proving one’s identity before executing sensitive actions or accessing sensitive data. If authentication is not implemented correctly on an application, attackers can exploit these misconfigurations to gain access to functionalities they should not be able to.
Improper Access Control
Authentication bypass issues are essentially improper access control. Improper access control occurs anytime when access control in an application is improperly implemented and can be bypassed by an attacker. However, access control comprises more than authentication. While authentication asks a user to prove their identity (“Who are you?”), authorization asks the application, “What is this user allowed to do?” Proper authentication and authorization together ensure that users cannot access functionalities outside of their permissions.
There are several ways of configuring authorization for users: role-based access control, ownership-based access control, access control lists, and more.
Directory traversal vulnerabilities are another type of improper access control. They happen when attackers can view, modify, or execute files they shouldn’t have access to by manipulating file paths in user-input fields. This process involves manipulating file path variables the application uses to reference files by adding the
../ characters or other special characters to the file path. The
../ sequence refers to the parent directory of the current directory in Unix systems, so by adding it to a file path, you can often reach system files outside the web directory.
Attackers can often use directory traversals to access sensitive files like configuration files, log files, and source code. To prevent directory traversals, you should validate user input that is inserted into file paths, or avoid direct references to file names and use indirect identifiers instead.
Arbitrary File Writes
Arbitrary file write vulnerabilities work similarly to directory traversals. If an application writes files to the underlying machine and determines the output file name via user input, attackers might be able to create arbitrary files on any path they want or overwrite existing system files. Attackers might be able to alter critical system files like password files or log files, or add their own executables into script directories.
The best way to mitigate this risk is by not creating file names based on any user input, including session information, HTTP input, or anything that the user controls. You should control the file name, path, and extension for every created file. For instance, you can generate a random alphanumeric filename every time the user needs to generate a unique file. You can also strip user input of special characters before creating the file.
Denial of Service Attacks
Denial of service attacks, or DoS attacks, disrupt the target machine so that legitimate users cannot access its services. Attackers can launch DoS attacks by exhausting all the server’s resources, crashing processes, or making too many time-consuming HTTP requests at once.
Denial of service attacks are hard to defend against. But there are ways to minimize your risk by making it as difficult as possible for attackers. For instance, you can deploy a firewall that offers DoS protection, and prevent logic-based DoS attacks by setting limits on file sizes and disallowing certain file types.
Encryption issues are probably one of the most severe vulnerabilities that can happen in an application. Encryption vulnerabilities refer to when encryption and hashing are not properly implemented. This can lead to widespread data leaks and authentication bypass through session spoofing.
Some common mistakes developers make when implementing encryption on a site are:
- Using weak algorithms
- Using the wrong algorithm for the purpose
- Creating custom algorithms
- Generating weak random numbers
- Mistaking encoding for encryption
Insecure TLS Configuration and Improper Certificate Validation
Besides encrypting the information in your data stores properly, you should also make sure that your application is transmitting data properly. A good way of making sure that you are communicating over the Internet securely is to use HTTPS with a modern version of transport layer security (TLS) and a secure cipher suite.
During this process, you need to ensure that you are communicating with a trusted machine and not a malicious third party. TLS uses digital certificates as the basis of its public-key encryption, and you need to validate these certificates before establishing the connection with the third party. You should verify that the server you are trying to connect to has a certificate that is issued by a trusted certificate authority (CA) and that none of the certificates in the certificate chain are expired.
“Mass assignment” refers to the practice of assigning values to multiple variables or object properties all at once. Mass assignment vulnerabilities happen when the application automatically assigns user input to multiple program variables or objects. This is a feature in many application frameworks designed to simplify application development.
However, this feature sometimes allows attackers to overwrite, modify, or create new program variables or object properties at will. This can lead to authentication bypass and manipulation of program logic. To prevent mass assignments, you can disable the mass assignment feature with the framework you are using or use a whitelist to only allow assignment on certain properties or variables.
Websites often need to automatically redirect their users. For example, this scenario happens when unauthenticated users try to access a page that requires logging in. The website will usually redirect those users to the login page, and then return them to their original location after they are authenticated.
During an open-redirect attack, an attacker tricks the user into visiting an external site by providing them with a URL from the legitimate site that redirects somewhere else. This can lead users to believe that they are still on the original site, and help scammers build a more believable phishing campaign.
To prevent open redirects, you need to make sure the application doesn’t redirect users to malicious locations. For instance, you can disallow off-site redirects completely by validating redirect URLs. There are many other ways of preventing open redirects, like checking the referrer of requests or using page indexes for redirects. But because it’s difficult to validate URLs, open redirects remain a prevalent issue in modern web applications.
Cross-Site Request Forgery
Cross-site request forgery (CSRF) is a client-side technique used to attack other users of a web application. Using CSRF, attackers can send HTTP requests that pretend to come from the victim, carrying out unwanted actions on a victim’s behalf. For example, an attacker could change your password or transfer money from your bank account without your permission.
Unlike open redirects, there is a surefire way of preventing CSRF: using a combination of CSRF tokens and SameSite cookies and avoiding using GET requests for state-changing actions.
Server-Side Request Forgery
SSRF, or server-side request forgery, is a vulnerability that happens when an attacker is able to send requests on behalf of a server. It allows attackers to “forge” the request signatures of the vulnerable server, therefore assuming a privileged position on a network, bypassing firewall controls, and gaining access to internal services.
Depending on the permissions given to the vulnerable server, an attacker might be able to read sensitive files, make internal API calls, and access internal services like hidden admin panels. The easiest way to prevent SSRF vulnerabilities is to never make outbound requests based on user input. But if you do need to make outbound requests based on user input, you’ll need to validate those addresses before initiating the request.
Trust Boundary Violations
“Trust boundaries” refer to where untrusted user input enters a controlled environment. For instance, an HTTP request is considered untrusted input until it has been validated by the server.
There should be a clear distinction between how you store, transport, and process trusted and untrusted input. Trust boundary violations happen when this distinction is not respected, and trusted and untrusted data are confused with each other. For instance, if trusted and untrusted data are stored in the same data structure or database, the application will start confusing the two. In this case, untrusted data might be mistakenly seen as validated.
A good way to prevent trust boundary violation is to never write untrusted input into session stores until it is verified.
Published at DZone with permission of Vickie Li. See the original article here.
Opinions expressed by DZone contributors are their own.