Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Read a Large CSV File With Java 8 and Stream API

DZone 's Guide to

How to Read a Large CSV File With Java 8 and Stream API

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. What do you do?

· Integration Zone ·
Free Resource

Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one Java object for each of the lines. In real life, the CSV file contains around 380,000 lines.

Assumption: you already know the path of the CSV file before using the code below.

The following code will read the file and create one Java object per line.

private List<YourJavaItem> processInputFile(String inputFilePath) {

    List<YourJavaItem> inputList = new ArrayList<YourJavaItem>();

    try{

      File inputF = new File(inputFilePath);
      InputStream inputFS = new FileInputStream(inputF);
      BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));

      // skip the header of the csv
      inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
      br.close();
    } catch (FileNotFoundException|IOException e) {
      ....
    }

    return inputList ;
}

Some explanation about the above code might be needed:

  •  lines() : returns a stream object.

  •  skip(1) : skips the first line in the CSV file, which in this case is the header of the file.

  •  map(mapToItem) : calls the mapToItem  function for each line in the file.

  •  collect(Collectors.toList()) : creates a list containing all the items created by mapToItem  function.

Now, mapToItem  function looks like this:

private Function<String, YourJavaItem> mapToItem = (line) -> {

  String[] p = line.split(COMMA);// a CSV has comma separated lines

  YourJavaItem item = new YourJavaItem();

  item.setItemNumber(p[0]);//<-- this is the first column in the csv file
  if (p[3] != null && p[3].trim().length() > 0) {
    item.setSomeProeprty(p[3]);
  }
  //more initialization goes here

  return item;
}

Performance Consideration

From the testing I've done, it seems that reading a 90 MB CSV file using the way described above will take around 700 ms when running from inside Eclipse. 

It is probably even faster in production.

Not bad. Happy coding!

Topics:
java 8 ,stream api ,csv file

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}