DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Speeding Up BigQuery Reads in Apache Beam/Dataflow

Trending

  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
  • Building Enterprise-Grade Real-Time IoT Dashboards with Vue 3, MQTT, and Kafka
  • OpenAPI From Code With Spring and Java: A Recipe for Your CI
  • The Hidden Cost of Overprivileged Tokens: Designing Messaging Platforms That Assume Compromise

Apache Beam Working With Files

The article explains how to read, write data to and from the file in Apache Beam with a pipeline where the ‘Employees.csv’ file be read/filtered/write to a new file.

By 
Sameer Shukla user avatar
Sameer Shukla
DZone Core CORE ·
Feb. 16, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
8.8K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

The article explains how to read, write data to and from the file in Apache Beam with a proper pipeline example. The reading data from the file is done through the ‘ReadFromText’ transform and writing to a new file is done using the ‘WriteToText’ transform. The first article explains how to read data from the file and how to write to a file, in the latter part of the article a Pipeline will be created where ‘Employees.csv’ file be read, filtered based on age, extract employee’s first name, last name, and age and write to a new file. Overall, the pipeline looks like this:

Data pipeline

Reading From a File

In the article, we are using the file from the site we have downloaded a 100 records file and named it ‘Employees.csv’ through ‘ReadFromText’ will read file from the disk. The below code showcases the same:

Code for reading from a file


Output

Code output for reading from a file


Writing to File

The ‘WriteToText’ transform is used to write the data to the file, the program below reads the data from the file and writes to the ‘out.csv’ file. 

Code for writing to a file

Output

Code output for writing to a file


Pipeline

The pipeline code contains two functions one for filtering out the rows where the employee age is greater than 40 and the second one is used to map only firstname, lastname, and age of the employee. 

Pipeline code for filtering and mapping data

In both functions, we are accessing records based on the index. The full pipeline code is mentioned below:

 Full Pipeline code


Contents of Generated File

Generated file content

Summary

In the article we have explored how to read, write data to and from the file, the article also explained the full pipeline code where the filtering, mapping on the data is performed and written to the new file.  

Apache Beam

Opinions expressed by DZone contributors are their own.

Related

  • Speeding Up BigQuery Reads in Apache Beam/Dataflow

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook