Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

OpenCSV: Properly Handling Backslashes

DZone's Guide to

OpenCSV: Properly Handling Backslashes

OpenCSV is a popular library for handing CSV data in Java projects, but there's a slight problem when dealing with backslashes. Let's see what it is and how to fix it.

· Java Zone ·
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

OpenCSV is one of the popular Java libraries out there used for handling CSV data. In this post, I will discuss one specific issue that I recently faced with this library.

The Problem

Here is a minimal code snippet for writing and reading CSV data using OpenCSV:

String dataValue = "test";

//writing  
StringWriter writer = new StringWriter();

try (CSVWriter csvwriter = new CSVWriter(writer)) {
    String[] originalData = new String[2];
    originalData[0] = dataValue;
    originalData[1] = dataValue;
    System.out.println("Original data: " + originalData[0] + "," + originalData[1]);
    csvwriter.writeNext(originalData);
} catch (IOException e) {
    throw new RuntimeException(e);
}
System.out.println("Written data: " + writer.toString());

//reading
try (CSVReader csvReader = new CSVReader(new StringReader(writer.toString()))) {
    String[] readData = csvReader.readNext();
    System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
    throw new RuntimeException(e);
}


The output of the above snippet gives us:

Original data: test,test
Written data: "test","test"

Read data: test,test


Which is as expected. Well, life is good with OpenCSV until you encounter a backslash character ('\\') in your CSV data.

So let's try running the same snippet with dataValue having a backslash character:

String dataValue = "t\\est";


Output:

Original data: t\est,t\est
Written data: "t\est","t\est"

Read data: test,test


Note that the backslash character is gone in the read CSV data.

The Root Cause

By default, CSVReader uses the backslash ('\\') as its escape character. Meanwhile,
CSVWriter uses a double quote('"') as the escape character.

Because of this, at the time of writing, the backslash characters lead to improper escaping. At the time of reading, a single backslash character will be ignored by the CSVParser, as it is the escape character.

The Solution

By default, CSVReader uses CSVParser for parsing CSV data. OpenCSV provides another parser (RFC4180Parser) that strictly follows RFC4180 standards.

Using RFC4180Parser, the CSVReader will use a double quote ('"') as the escape character, making it consistent with CSVWriter.

We need to replace the reading part of the above-mentioned snippet with the following code:

RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(new StringReader(writer.toString()))
                .withCSVParser(rfc4180Parser);
try (CSVReader csvReader = csvReaderBuilder.build()) {
    String[] readData = csvReader.readNext();
    System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
    throw new RuntimeException(e);
}


Output:

Original data: t\est,t\est
Written data: "t\est","t\est"

Read data: t\est,t\est


P.S.: Apache Commons CSV is a good alternative to OpenCSV.

The library version used for the code snippets was OpenCSV 4.0.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:
csv ,java ,troubleshooting ,opencsv ,backslash ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}