Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Read a Large CSV File with Java 8 and Stream API

DZone's Guide to

How to Read a Large CSV File with Java 8 and Stream API

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. What do you do?

· Integration Zone
Free Resource

Today’s data climate is fast-paced and it’s not slowing down. Here’s why your current integration solution is not enough. Brought to you in partnership with Liaison Technologies.

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. In real life, the CSV file contains around 380,000 lines.

Assumption: you already know the path of the CSV file before using the code below.

The following code will read the file and create one java object per line.

private List<YourJavaItem> processInputFile(String inputFilePath) {

    List<YourJavaItem> inputList = new ArrayList<YourJavaItem>();

    try{

      File inputF = new File(inputFilePath);
      InputStream inputFS = new FileInputStream(inputF);
      BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));

      // skip the header of the csv
      inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
      br.close();
    } catch (FileNotFoundException|IOException e) {
      ....
    }

    return inputList ;
}

Some explanation about the above code might be needed:

lines(): returns a stream object.

skip(1): skips the first line in the CSV file, which in this case is the header of the file.

map(mapToItem): calls the mapToItem function for each line in the file.

collect(Collectors.toList()): creates a list containing all the items created by mapToItem function.

Now, mapToItem function looks like this:

private Function<String, YourJavaItem> mapToItem = (line) -> {

  String[] p = line.split(COMMA);// a CSV has comma separated lines

  YourJavaItem item = new YourJavaItem();

  item.setItemNumber(p[0]);//<-- this is the first column in the csv file
  if (p[3] != null && p[3].trim().length() > 0) {
    item.setSomeProeprty(p[3]);
  }
  //more initialization goes here

  return item;
}

Performance Consideration

From the testing I've done, it seems that reading a 90 MB CSV file using the way described above will take around 700 ms when running from inside Eclipse. 

It is probably even faster in production.

Not bad. Happy coding!

Is iPaaS solving the right problems? Not knowing the fundamental difference between iPaaS and iPaaS+ could cost you down the road. Brought to you in partnership with Liaison Technologies.

Topics:
java 8 ,stream api ,csv file

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}