Built in .NET CSV Parser
Join the DZone community and get the full member experience.
Join For FreeIn administrative systems, there is often a need to import and parse csv files. .NET actually has a built in CSV parser, although it is well hidden in a VB.NET namespace. If I had known about it I wouldn’t have had to write all those custom (sometimes buggy) parsers.
To really test the parser, I’m going to parse a csv file in the Swedish format.
Name; FactoryLocation; EstablishedYear; ProfitMillionSEK Volvo; "Gothenburg, Sweden; Gent, Belgium"; 1926; 0,345463 #A comment line Saab; Trollhättan, Sweden; 1945; -3 009
Note that there is embedded ; in the FactoryLocation field of Volvo, which is part of the field text and not a field delimiter.
There are three special formatting rules that applies to Swedish csv files.
- The decimal delimiter is ,
- The field delimiter is ; to not be confused with the decimal delimiter
- The thousand separator in numbers is a space.
I really did my best to come up with a format that requires flexibility of the parser, but the TextFieldParser has really flexible configuration options and just worked.
To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project. Then it’s just to instantiate the parser, set needed configuration through properties and start parsing.
// TextFieldParser is in the Microsoft.VisualBasic.FileIO namespace. using (TextFieldParser parser = new TextFieldParser(path)) { parser.CommentTokens = new string[] { "#" }; parser.SetDelimiters(new string[] { ";" }); parser.HasFieldsEnclosedInQuotes = true; // Skip over header line. parser.ReadLine(); while (!parser.EndOfData) { string[] fields = parser.ReadFields(); yield return new Brand() { Name = fields[0], FactoryLocation = fields[1], EstablishedYear = int.Parse(fields[2]), Profit = double.Parse(fields[3], swedishCulture) }; } }
The parser can be configured with comment tokens and delimiters. It can handle fields enclosed in quotes. There is also multiple read functions. It can read lines just as a string, it can split the line into fields and it can read the remainder of a file as a huge string. To be honest, it’s way better than any of the parsers I’ve written.
The only thing I could possibly wish for is built in conversion to other data types than strings and object materialization. It could be an interesting thing to write, so maybe I’ll come back with a materialization wrapper.
Wouldn’t it be cool to have a data annotations based csv parser? Create a class with proper annotations and then automatically parse data from a csv file!
Published at DZone with permission of Anders Abel, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments