DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How are you handling the data revolution? We want your take on what's real, what's hype, and what's next in the world of data engineering.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Java Stream API: 3 Things Every Developer Should Know About
  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • Functional Approach To String Manipulation in Java
  • How to Convert Excel and CSV Documents to HTML in Java

Trending

  • How Predictive Analytics Became a Key Enabler for the Future of QA
  • CRITICAL_PROCESS_DIED: How to Fix This Windows Blue Screen Error
  • How to Test Multi-Threaded and Concurrent Java
  • CORS Misconfigurations: The Simple API Header That Took Down Our Frontend
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Read a Large CSV File With Java 8 and Stream API

How to Read a Large CSV File With Java 8 and Stream API

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. What do you do?

By 
Eugen Hoble user avatar
Eugen Hoble
·
Sep. 28, 16 · Tutorial
Likes (27)
Comment
Save
Tweet
Share
263.5K Views

Join the DZone community and get the full member experience.

Join For Free

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one Java object for each of the lines. In real life, the CSV file contains around 380,000 lines.

Assumption: you already know the path of the CSV file before using the code below.

The following code will read the file and create one Java object per line.

private List<YourJavaItem> processInputFile(String inputFilePath) {

    List<YourJavaItem> inputList = new ArrayList<YourJavaItem>();

    try{

      File inputF = new File(inputFilePath);
      InputStream inputFS = new FileInputStream(inputF);
      BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));

      // skip the header of the csv
      inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
      br.close();
    } catch (FileNotFoundException|IOException e) {
      ....
    }

    return inputList ;
}

Some explanation about the above code might be needed:

  •  lines() : returns a stream object.

  •  skip(1) : skips the first line in the CSV file, which in this case is the header of the file.

  •  map(mapToItem) : calls the mapToItem  function for each line in the file.

  •  collect(Collectors.toList()) : creates a list containing all the items created by mapToItem  function.

Now, mapToItem  function looks like this:

private Function<String, YourJavaItem> mapToItem = (line) -> {

  String[] p = line.split(COMMA);// a CSV has comma separated lines

  YourJavaItem item = new YourJavaItem();

  item.setItemNumber(p[0]);//<-- this is the first column in the csv file
  if (p[3] != null && p[3].trim().length() > 0) {
    item.setSomeProeprty(p[3]);
  }
  //more initialization goes here

  return item;
}

Performance Consideration

From the testing I've done, it seems that reading a 90 MB CSV file using the way described above will take around 700 ms when running from inside Eclipse. 

It is probably even faster in production.

Not bad. Happy coding!

CSV API Java (programming language) Stream (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Java Stream API: 3 Things Every Developer Should Know About
  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • Functional Approach To String Manipulation in Java
  • How to Convert Excel and CSV Documents to HTML in Java

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: