DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • From Zero to Scale With AWS Serverless
  • How to Configure AWS Glue Job Using Python-Based AWS CDK
  • Unlocking the Power of Serverless AI/ML on AWS: Expert Strategies for Scalable and Secure Applications
  • Building Scalable and Efficient Architectures With ECS Serverless and Event-Driven Design

Trending

  • How to Build Local LLM RAG Apps With Ollama, DeepSeek-R1, and SingleStore
  • Stateless vs Stateful Stream Processing With Kafka Streams and Apache Flink
  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  • Operational Principles, Architecture, Benefits, and Limitations of Artificial Intelligence Large Language Models
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Serverless Extraction and Processing of CSV Content From a Zip File With Zero Coding

Serverless Extraction and Processing of CSV Content From a Zip File With Zero Coding

This article demonstrates the implementation of a serverless function in AWS using Kumologica for unzipping and extracting CSV file content.

By 
Satheesh Valluru user avatar
Satheesh Valluru
·
Apr. 26, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
4.8K Views

Join the DZone community and get the full member experience.

Join For Free

In the field of IT, file extraction and processing refer to the process of extracting information from various types of files, such as text files, images, videos, and audio files, and then processing that information to make it usable for a specific purpose.

File extraction involves reading and parsing the data stored in a file, which could be in a variety of formats, such as PDF, CSV, XML, or JSON, among others. Once the information is extracted, it can be processed using various techniques such as data cleansing, transformation, and analysis to generate useful insights.

File processing is a crucial part of many IT applications, including data warehousing, business intelligence, and data analytics. For example, in the context of data warehousing, file extraction, and processing are used to extract data from various sources and transform it into a format that is suitable for analysis and reporting. Similarly, in the context of business intelligence, file processing can be used to generate reports and dashboards that provide insights into key business metrics.

In this article, we will see how file extraction and processing are carried out in a serverless world. For this, we will go through a use case, design a solution on serverless infrastructure such as AWS Lambda and Amazon S3, and implement it using Kumologica Designer.

Use Case

ABC Corp is an enterprise that gets employee details from an onboarding system A as a zip file containing a CSV file. The CSV file has the employee name, dob, role, employee id, employee status, location, and phone number. Employee name, location, and employment status are required by system B for access provisioning. An integration service needs to be developed that need to sync the data from system A to system B. 

Integration InterfaceSolution

The enterprise is having AWS as its cloud provider, so we are going to build this integration solution using AWS serverless services such as AWS Lambda and Amazon S3 bucket. In this solution system, A is going to drop a zip file containing the CSV file with the employee information to an Amazon S3 bucket (employeestore). Whenever a new zip file is detected, the integration service needs to pull the zip file from the specific S3 bucket. The zip file received from the S3 bucket is unzipped by the integration service, and the CSV file is parsed to retrieve the necessary content. The content is then transformed as JSON content, and the JSON file is placed in sysb Amazon S3 bucket. System B will then read the data from the sysb bucket.

KumologicaPre-Requisites

  1. Having an AWS cloud account with necessary IAM access or user having permissions (As prescribed during Kumologica designer installation).
  2. AWS profile configured in your machine.
  3. Create two AWS S3 bucket with the name — employeestore and sysb, respectively.
  4. Install Kumologica Designer.
  5. The zip file has the CSV content as given below:
Plain Text
 
employee_name, dob, role, employee_id, employee_status, location, phone_number
Tom,08/04/87,developer,123,active,220 George st Sydney,613459034
Harry,01/05/89,manager,345,active,220 George st Sydney,613959034
Jina,02/05/94,developer,442,inactive,10 Carmen st Melbourne,613779034
Jack,07/01/85,Analyst,190,active,,220 George st Sydney,613453333


Implementation

Let's get started implementing the solution by first opening the Kumologica Designer by using the command kl open via terminal or Windows command line.

1. Create a new project by providing the Project name and employeesync service.

2. Install the ZIP node in the workspace by opening a command line or terminal and going to the path of your project workspace (location of the project package.json). Enter the following npm command to install.

Plain Text
 
npm i @kumologica/kumologica-contrib-zip


3. Restart the designer.

4. Drag and drop the EvenListener node from the palette to the canvas and select AmazonS3 from the drop-down. This is for flow to accept Amazon S3 trigger events when a file is created in the S3 bucket.

5.  Add a logger and wire it to the EventListener node. Provide the following configuration.

 
Display Name : Log_entry
Level: INFO
Message : msg.payload
Log format : String

6. Add an S3 node and wire to the logger. Provide the following configuration.

 
Display Name : GetEmployeeFile
Operation : GetObject
Bucket : employeestore
Key : employeeinfo.zip
RequestTimeout : 10000

7. Drop the ZIP node to canvas and wire it to the S3 node. Provide the following configuration.

 
Operation : Extract
Content : msg.payload.Body

8. Add a Function node and wire to the ZIP node. Provide the following code.

JavaScript
 
msg.payload = msg.payload[0].content.toString()


9. Drop a CSV node for parsing. Wire the CSV node to the Function node.

 
Display Name : CSV
Columns : employee_name, dob, role, employee_id, employee_status, location, phone_number
Seperator : Comma

10. Add the S3 bucket node for pushing the JSON content to the sysb bucket.

 
Display Name : PushtoSysBBucket
Operation : PutObject
Bucket : sysb
Key : employeeinfo.json
Content : msg.payload

11. Finally, we will end the flow by adding the EventListenerEnd node.

 
Payload : {"status" : "completed"}

Now it's time to deploy the service.

Deployment

  1. Select the CLOUD tab on the right panel of Kumologica Designer, and select your AWS Profile.
  2. Go to the “Trigger” section under the Cloud tab and select the S3 bucket (employeestore) where the CSV file is expected. 
  3. Click the Deploy button.

Conclusion

The Kumologica designer simplifies the deployment process by automatically packaging the flow as a lambda zip file and generating an AWS cloud formation script. This script deploys the flow as a node.js-based AWS function and creates a trigger for an S3 bucket. With this serverless service, minimal coding is required. We hope you found this article informative and look forward to sharing our next tutorial with you.

AWS AWS Lambda CSV Data processing Serverless computing

Opinions expressed by DZone contributors are their own.

Related

  • From Zero to Scale With AWS Serverless
  • How to Configure AWS Glue Job Using Python-Based AWS CDK
  • Unlocking the Power of Serverless AI/ML on AWS: Expert Strategies for Scalable and Secure Applications
  • Building Scalable and Efficient Architectures With ECS Serverless and Event-Driven Design

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!