Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Read a Large CSV File with Java 8 and Stream API

DZone's Guide to

How to Read a Large CSV File with Java 8 and Stream API

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. What do you do?

· Integration Zone
Free Resource

Share, secure, distribute, control, and monetize your APIs with the platform built with performance, time-to-value, and growth in mind. Free 90-day trial of 3Scale by Red Hat

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. In real life, the CSV file contains around 380,000 lines.

Assumption: you already know the path of the CSV file before using the code below.

The following code will read the file and create one java object per line.

private List<YourJavaItem> processInputFile(String inputFilePath) {

    List<YourJavaItem> inputList = new ArrayList<YourJavaItem>();

    try{

      File inputF = new File(inputFilePath);
      InputStream inputFS = new FileInputStream(inputF);
      BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));

      // skip the header of the csv
      inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
      br.close();
    } catch (FileNotFoundException|IOException e) {
      ....
    }

    return inputList ;
}

Some explanation about the above code might be needed:

lines(): returns a stream object.

skip(1): skips the first line in the CSV file, which in this case is the header of the file.

map(mapToItem): calls the mapToItem function for each line in the file.

collect(Collectors.toList()): creates a list containing all the items created by mapToItem function.

Now, mapToItem function looks like this:

private Function<String, YourJavaItem> mapToItem = (line) -> {

  String[] p = line.split(COMMA);// a CSV has comma separated lines

  YourJavaItem item = new YourJavaItem();

  item.setItemNumber(p[0]);//<-- this is the first column in the csv file
  if (p[3] != null && p[3].trim().length() > 0) {
    item.setSomeProeprty(p[3]);
  }
  //more initialization goes here

  return item;
}

Performance Consideration

From the testing I've done, it seems that reading a 90 MB CSV file using the way described above will take around 700 ms when running from inside Eclipse. 

It is probably even faster in production.

Not bad. Happy coding!

Explore the core elements of owning an API strategy and best practices for effective API programs. Download the API Owner's Manual, brought to you by 3Scale by Red Hat

Topics:
java 8 ,stream api ,csv file

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}