DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Why SAP S/4HANA Landscape Design Impacts Cloud TCO More Than Compute Costs
  • Cost Is a Distributed Systems Bug
  • From Data Lakes to Intelligence Lakes: Augmenting Apache Iceberg With Generative AI Metadata on AWS
  • From CloudWatch to Cost Watch: Cutting Observability Costs With Vector

Trending

  • How to Test a PATCH API Request With REST-Assured Java
  • S3 Vectors: How to Build a RAG Without a Vector Database
  • From APIs to Actions: Rethinking Back-End Design for Agents
  • OpenAPI From Code With Spring and Java: A Recipe for Your CI
  1. DZone
  2. Coding
  3. JavaScript
  4. Unlocking the Power of Streaming: Effortlessly Upload Gigabytes to AWS S3 With Node.js

Unlocking the Power of Streaming: Effortlessly Upload Gigabytes to AWS S3 With Node.js

This article guides you through building a Node.js application for efficient data uploads to Amazon S3, including setup, integration, and database storage.

By 
Vishal Diyora user avatar
Vishal Diyora
·
Nov. 23, 23 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

Uploading massive datasets to Amazon S3 can be daunting, especially when dealing with gigabytes of information. However, a solution exists within reach. We can revolutionize this process by harnessing the streaming capabilities of a Node.js TypeScript application. Streaming enables us to transfer substantial data to AWS S3 with remarkable efficiency, all while conserving memory resources and ensuring scalability. In this article, we embark on a journey to unveil the secrets of developing a Node.js TypeScript application that seamlessly uploads gigabytes of data to AWS S3 using the magic of streaming.

Setting up the Node.js Application

Let's start by setting up a new Node.js project:

Shell
 
mkdir aws-s3-upload
cd aws-s3-upload
npm init -y


Next, install the necessary dependencies:

Shell
 
npm install aws-sdk axios
npm install --save-dev @types/aws-sdk @types/axios typescript ts-node
npm install --save-dev @types/express @types/multer multer multer-s3


Configuring AWS SDK and Multer

In this section, we'll configure the AWS SDK to enable communication with Amazon S3. Ensure you have your AWS credentials ready.

JavaScript
 
import { S3 } from 'aws-sdk';
import multer from 'multer';
import multerS3 from 'multer-s3';
import { v4 as uuidv4 } from 'uuid';

const app = express();
const port = 3000;
const s3 = new S3({
  accessKeyId: 'YOUR_AWS_ACCESS_KEY_ID',
  secretAccessKey: 'YOUR_AWS_SECRET_ACCESS_KEY',
  region: 'YOUR_AWS_REGION',
});


We'll also set up Multer to handle file uploads directly to S3. Define the storage configuration and create an upload middleware instance.

JavaScript
 
const upload = multer({
  storage: multerS3({
    s3,
    bucket: 'YOUR_S3_BUCKET_NAME',
    contentType: multerS3.AUTO_CONTENT_TYPE,
    acl: 'public-read',
    key: (req, file, cb) => {
      cb(null, `uploads/${uuidv4()}_${file.originalname}`);
    },
  }),
});


Creating the File Upload Endpoint

Now, let's create a POST endpoint for handling file uploads:

JavaScript
 
app.post('/upload', upload.single('file'), (req, res) => {
  if (!req.file) {
    return res.status(400).json({ message: 'No file uploaded' });
  }

  const uploadedFile = req.file;
  console.log('File uploaded successfully. S3 URL:', uploadedFile.location);

  res.json({
    message: 'File uploaded successfully',
    url: uploadedFile.location,
  });
});


Testing the Application

To test the application, you can use tools like Postman or cURL. Ensure you set the Content-Type header to multipart/form-data and include a file in the request body with the field name 'file.'

Choosing Between Database Storage and Cloud Storage

Whether to store files in a database or an S3 bucket depends on your specific use case and requirements. Here's a brief overview:

Database Storage

  • Data Integrity: Ideal for ensuring data integrity and consistency between structured data and associated files, thanks to ACID transactions.
  • Security: Provides fine-grained access control mechanisms, including role-based access control.
  • File Size: Suitable for small to medium-sized files in terms of performance and storage cost.
  • Transactional workflows: Useful for applications with complex transactions involving both structured data and files.
  • Backup and recovery: Facilitates inclusion of files in database backup and recovery processes.

S3 Bucket Storage

  • Scalability: Perfect for large files and efficient file storage, scaling to gigabytes, terabytes, or petabytes of data.
  • Performance: Optimized for fast file storage and retrieval, especially for large media files or binary data.
  • Cost-efficiency: Cost-effective for large volumes of data compared to databases, with competitive pricing.
  • Simplicity: Offers straightforward file management, versioning, and easy sharing via public or signed URLs.
  • Use cases: Commonly used for storing static assets and content delivery and as a scalable backend for web and mobile file uploads.
  • Durability and availability: Ensures high data durability and availability, suitable for critical data storage.

Hybrid Approach: In some cases, metadata and references to files are stored in a database, while the actual files are stored in an S3 bucket, combining the strengths of both approaches.

The choice should align with your application's needs, considering factors like file size, volume, performance requirements, data integrity, access control, and budget constraints.

Multer vs. Formidable — Choosing the Right File Upload Middleware

When building Node.js applications with Express, choosing the suitable file upload middleware is essential. Let's compare two popular options: Multer and Formidable.

Multer With Express

  • Express integration: Seamlessly integrates with Express for easy setup and usage.
  • Abstraction layer: Provides a higher-level abstraction for handling file uploads, reducing boilerplate code.
  • Middleware chain: Easily fits into Express middleware chains, enabling selective usage on specific routes or endpoints.
  • File validation: Supports built-in file validation, enhancing security and control over uploaded content.
  • Multiple file uploads: Handles multiple file uploads within a single request efficiently.
  • Documentation and community: Benefits from extensive documentation and an active community.
  • File renaming and storage control: Allows customization of file naming conventions and storage location.

Formidable With Express

  • Versatility: Works across various HTTP server environments, not limited to Express, offering flexibility.
  • Streaming: Capable of processing incoming data streams, ideal for handling huge files efficiently.
  • Customization: Provides granular control over the parsing process, supporting custom logic.
  • Minimal dependencies: Keeps your project lightweight with minimal external dependencies.
  • Widely adopted: A well-established library in the Node.js community.

Choose Multer and Formidable based on your project's requirements and library familiarity. Multer is excellent for seamless integration with Express, built-in validation, and a straightforward approach. Formidable is preferred when you need more customization, versatility, or streaming capabilities for large files.

Conclusion

In conclusion, this article has demonstrated how to develop a Node.js TypeScript application for efficiently uploading large data sets to Amazon S3 using streaming. Streaming is a memory-efficient and scalable approach, mainly when dealing with gigabytes of data. Following the steps outlined in this guide can enhance your data upload capabilities and build more robust applications.

AWS Gigabyte Node.js Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Why SAP S/4HANA Landscape Design Impacts Cloud TCO More Than Compute Costs
  • Cost Is a Distributed Systems Bug
  • From Data Lakes to Intelligence Lakes: Augmenting Apache Iceberg With Generative AI Metadata on AWS
  • From CloudWatch to Cost Watch: Cutting Observability Costs With Vector

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook