How To Protect Node.js Form Uploads With a Deterministic Threat Detection API
Learn more about the prevalence of custom file upload forms in web applications and a deterministic threat detection solution for Node.js form uploads.
Join the DZone community and get the full member experience.
Join For FreeIt’s increasingly common to see web applications incorporate custom file upload forms, and popular runtime environments like Node.js have played a noteworthy role in making this possible. This has, in turn, converted form upload entry points into a burgeoning attack vector, as threat actors are now incentivized to exploit insecure form uploads in targeted attacks using specially crafted malicious files.
In this article, we’ll briefly examine why the popularity of custom form upload handlers has increased in recent years, and we’ll subsequently look at a deterministic threat detection API that can help protect a Node.js form upload application.
Defining File Upload Forms
When we talk about “file upload forms” in this article, we’re referring to HTML web forms that allow users to select and upload files from their computer (or device) to a web server. The form itself is composed of basic HTML elements, and it can simultaneously collect files and text-input data before sending that collection to a web server as multipart/form-data HTTP content. That collection of data is subsequently processed by a server-side application, which determines where each piece of data (text or file bytes) should go – among other things.
Here’s a rudimentary example of an HTML form that captures a user's first name, last name, and email address in addition to capturing a file from their file system:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>File Upload Form</title>
</head>
<body>
<h2>Upload File Form</h2>
<form action="/upload" method="POST" enctype="multipart/form-data">
<label for="firstName">First Name:</label><br>
<input type="text" id="firstName" name="firstName" required><br><br>
<label for="lastName">Last Name:</label><br>
<input type="text" id="lastName" name="lastName" required><br><br>
<label for="email">Email Address:</label><br>
<input type="email" id="email" name="email" required><br><br>
<label for="file">Select File:</label><br>
<input type="file" id="file" name="file" accept=".pdf,.docx" required><br><br>
<input type="submit" value="Upload File">
</form>
</body>
</html>
The Rise of Form Uploads
The implementation of an HTML form is extremely simple, and the benefits of incorporating one in just about any web application – whether that be an e-commerce app, social media app, resume upload portal, or even an insurance claims portal – are easy to understand. User data and user-generated content are twinned kings of the digital age, and form uploads capture both in one fell swoop.
Putting the business benefits aside, we can attribute the increased viability of custom form uploads to a few important technology-related factors.
On the one hand, we can point to the steadily increased availability of cloud computing. It’s never been more affordable to purchase “pay-as-you-go” cloud storage models, and cloud storage has quickly gone from an “exciting new concept” to an “industry standard model” in what seems like no time at all. It’s easy for startups and growing businesses of all shapes and sizes to store user data and user-generated content on scalable web servers hosted by large public cloud providers.
On the other hand, building server-side form upload handlers (let alone server-side applications of any kind) has never been easier – especially with the availability and popularity of an open-source, cross-platform JavaScript runtime environment like Node.js.
Node.js is a runtime environment that’s more or less immediately accessible for JavaScript developers (the most common developer demographic in the world), reducing a significant barrier to entry in server-side development that once existed just 15 years ago.
Compared, for instance, to an equivalent undertaking in .NET or Java, building a form upload handler (i.e., a server-side application that handles multipart/form-data inputs from an HTML form) in Node.js is relatively straightforward. It largely hinges on the installation of exceedingly popular, open-source, and easy-to-use middleware frameworks like Express.js and Multer.
Using the Node Package Manager (NPM; the default package manager for Node.js), we can simply run commands like NPM install express
and NPM install multer
and then import those modules when we set up our server.
const express = require('express');
const multer = require('multer');
While Express.js is a flexible framework that offers a robust set of features for all sorts of web (and mobile) applications, Multer exclusively handles multipart/form-data requests, and the two integrate seamlessly.
Multer can handle multiple files uploaded through different fields in a single form, and it’s easy to tie a Multer upload handler into Node.js code that sends files to a cloud storage instance (e.g., AWS and Azure). Multer’s fantastic efficiency can be attributed to the fact that it’s written on top of Busboy, a powerful module designed to parse incoming HTML form data.
A Node.js form handler sending data to a cloud storage bucket might look something like the following generic example. With some extra information involved, this would send file uploads from a client-side HTML form upload to an AWS S3 storage bucket:
const express = require('express');
const multer = require('multer');
const AWS = require('aws-sdk');
const path = require('path');
const app = express();
const port = 'YOUR PORT HERE';
// This configures the AWS SDK
AWS.config.update({
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
region: process.env.AWS_REGION
});
const s3 = new AWS.S3();
// Here we configure Multer for file upload
const storage = multer.memoryStorage();
const upload = multer({ storage: storage });
// Here we serve the HTML form
app.get('/', (req, res) => {
res.sendFile(path.join(__dirname, 'index.html'));
});
// Here we provide the route to handle the file upload
app.post('/upload', upload.single('file'), (req, res) => {
if (!req.file) {
return res.status(400).send('No file uploaded.');
}
// Here we define some parameters for upload to our AWS S3 bucket
const params = {
Bucket: process.env.S3_BUCKET_NAME,
Key: Date.now() + '-' + req.file.originalname,
Body: req.file.buffer
};
// Here we actually upload our file to the S3 bucket
s3.upload(params, (err, data) => {
if (err) {
console.error(err);
return res.status(500).send('Error uploading file.');
}
res.send(`File uploaded successfully. ${data.Location}`);
});
});
// Here we start the Express.js server and ask it to listen for incoming HTTP requests on a specific port
app.listen(port, () => {
console.log(`Server running at http://localhost:${port}`);
});
Please note that this example is intentionally simplistic for demonstration purposes. Among a few other practical issues, this code lacks robust error handling and exposes sensitive credentials in our code. It also assumes environment variables for our AWS account have already been set elsewhere.
Understanding the File Upload Form Attack Vector
Once we build a Node/Express/Multer server application to capture file uploads, the more challenging problem becomes:
- Scanning those files for traditional threats (e.g., viruses and malware)
- Verifying the contents of those files for non-malware content threats (e.g., executables, scripts, macros, etc.)
At a high level, these are the two main groups of threats we can expect to face through a file upload entry point – though it’s important to note that file upload threats, like any cybersecurity threat, can differ drastically in complexity and concealment.
While files containing viruses and malware can often be detected by comparing file signatures with known malware “families," or by actively analyzing the intended behavior of a file’s instructions in a threat sandbox, other types of threats are often more difficult to identify outright. File uploads can masquerade as one file type while containing the contents of another, and most traditional antivirus (AV) solutions are poorly equipped to identify threats in this scenario.
Let’s say, for example, a fairly sophisticated threat actor decides to launch a targeted attack on our upstart business file upload portal. Their goal is to upload a specially crafted fake PDF (composed of HTML/JavaScript) to our cloud storage instance. Eventually, when a user opens the PDF in their browser (e.g., to review the document’s contents), the fake PDF will execute code that downloads malicious content onto the user’s device from a remote server. This fake PDF contains a valid PDF extension, and it does not contain any traceable malware; the code is written from scratch.
We can’t rely on an AV solution to identify this threat, because no viruses or malware are involved. We can’t rely on our basic cloud storage subscription to deep-verify the PDF content for threats, either as advanced threat detection likely lies outside the scope of our affordable cloud storage subscription.
Deterministic Threat Detection
One way we can mitigate this threat is by incorporating a deterministic content verification solution into our server-side file upload handler. Deterministic threat detection is characterized by predefined rules; in this case, that means we’ll decide ahead of time which content can and cannot pass through our server application into cloud storage (or any storage location), and we’ll ensure unsuspecting users can’t gain access to the document as a result.
If we were to deterministically verify the contents of the JavaScript injection PDF described in the earlier example, we could compare the fake PDF contents with real PDF formatting standards, and we could therefore determine that the alleged PDF did not rigorously conform with PDF formatting standards whatsoever, despite presenting a valid PDF extension. After making this assessment, we could quarantine the file for analysis (to better understand the threats entering through our file upload portal), or we could simply delete the file outright and return a generic error message to the client-side attacker.
In the below demonstration, we’ll look at one free-to-use threat detection API that integrates easily into our Node.js form upload handler. It combines deterministic content verification and signature-based virus scanning to provide a dynamic and flexible threat detection solution for our Node.js application.
Demonstration
Using the ready-to-run Node.js code examples provided below, we can structure our threat detection API call in a few quick steps. We can authorize our API calls with a free API key, and we can install the client SDK easily with a simple NPM command.
Let’s take care of that now. We can run the following command to install the client SDK:
npm install cloudmersive-virus-api-client --save
Alternatively, we could also add the Node client to our package.json:
"dependencies": {
"cloudmersive-virus-api-client": "^1.1.9"
}
Next, we can use the below code examples as the basis to structure our API call within our Node.js application. We can replace the ‘YOUR API KEY’
placeholder string with our own API key:
var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;
// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';
var apiInstance = new CloudmersiveVirusApiClient.ScanApi();
var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.
var opts = {
'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};
var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);
By default, the underlying deterministic content verification model will identify a range of content types as a threat, including executables, invalid files, macros, scripts, password-protected files (commonly used to disguise threats via encryption), HTML, Object Linking and Embedding (OLE) and other insecure content. We can customize threat rules in the request body to adjust threat detection as we see fit.
We can also include a comma-separated list of acceptable file formats in the ‘restrictFileTypes’
parameter to limit our file upload threat surface. It’s worth noting we can do something similar using Multer’s ‘fileFilter’
option if we’d prefer that instead.
Let’s review an example response object. The following response was generated from processing an inert JavaScript injection PDF file (this file was designed for testing; it simply displays the message “you’ve been hacked!” when opened in a web browser):
{
"CleanResult": false,
"ContainsExecutable": false,
"ContainsInvalidFile": true,
"ContainsScript": false,
"ContainsPasswordProtectedFile": false,
"ContainsRestrictedFileFormat": false,
"ContainsMacros": false,
"ContainsXmlExternalEntities": false,
"ContainsInsecureDeserialization": false,
"ContainsHtml": false,
"ContainsUnsafeArchive": false,
"ContainsOleEmbeddedObject": false,
"VerifiedFileFormat": ".pdf",
"FoundViruses": null,
"ContentInformation": {
"ContainsJSON": false,
"ContainsXML": false,
"ContainsImage": false,
"RelevantSubfileName": null
}
}
The “CleanResult”: false
value indicates the file contains a threat, and the “ContainsInvalidFile”: true
response tells us why. We’ll notice that the “VerifiedFileFormat”
string still identifies ".pdf"
as the file type, despite identifying the document as an invalid file; this indicates the file would’ve successfully executed in our browser (or, potentially, in a vulnerable PDF rendering/processing application).
Conclusion
In this article, we’ve taken a high-level look at the growing popularity of file upload forms, the increasing viability of server-side form upload handlers (thanks to runtime environments like Node.js), and one example of a targeted file upload attack designed to exploit an insecure file upload form in a web application. In the end, we’ve walked through a quick demonstration of a deterministic threat detection API that can help protect our Node.js form from disguised malicious content.
Opinions expressed by DZone contributors are their own.
Comments