OpenCSV: Properly Handling Backslashes
OpenCSV is a popular library for handing CSV data in Java projects, but there's a slight problem when dealing with backslashes. Let's see what it is and how to fix it.
Join the DZone community and get the full member experience.
Join For FreeOpenCSV is one of the popular Java libraries out there used for handling CSV data. In this post, I will discuss one specific issue that I recently faced with this library.
The Problem
Here is a minimal code snippet for writing and reading CSV data using OpenCSV:
String dataValue = "test";
//writing
StringWriter writer = new StringWriter();
try (CSVWriter csvwriter = new CSVWriter(writer)) {
String[] originalData = new String[2];
originalData[0] = dataValue;
originalData[1] = dataValue;
System.out.println("Original data: " + originalData[0] + "," + originalData[1]);
csvwriter.writeNext(originalData);
} catch (IOException e) {
throw new RuntimeException(e);
}
System.out.println("Written data: " + writer.toString());
//reading
try (CSVReader csvReader = new CSVReader(new StringReader(writer.toString()))) {
String[] readData = csvReader.readNext();
System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
throw new RuntimeException(e);
}
The output of the above snippet gives us:
Original data: test,test
Written data: "test","test"
Read data: test,test
Which is as expected. Well, life is good with OpenCSV until you encounter a backslash character ('\\') in your CSV data.
So let's try running the same snippet with dataValue having a backslash character:
String dataValue = "t\\est";
Output:
Original data: t\est,t\est
Written data: "t\est","t\est"
Read data: test,test
Note that the backslash character is gone in the read CSV data.
The Root Cause
By default, CSVReader uses the backslash ('\\') as its escape character. Meanwhile,
CSVWriter uses a double quote('"') as the escape character.
Because of this, at the time of writing, the backslash characters lead to improper escaping. At the time of reading, a single backslash character will be ignored by the CSVParser, as it is the escape character.
The Solution
By default, CSVReader
uses CSVParser
for parsing CSV data. OpenCSV provides another parser (RFC4180Parser
) that strictly follows RFC4180 standards.
Using RFC4180Parser
, the CSVReader
will use a double quote ('"') as the escape character, making it consistent with CSVWriter
.
We need to replace the reading part of the above-mentioned snippet with the following code:
RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(new StringReader(writer.toString()))
.withCSVParser(rfc4180Parser);
try (CSVReader csvReader = csvReaderBuilder.build()) {
String[] readData = csvReader.readNext();
System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
throw new RuntimeException(e);
}
Output:
Original data: t\est,t\est
Written data: "t\est","t\est"
Read data: t\est,t\est
P.S.: Apache Commons CSV is a good alternative to OpenCSV.
The library version used for the code snippets was OpenCSV 4.0.
Published at DZone with permission of Vatsal Mevada. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments