Python 2.7 CSV Files with Unicode Characters
Join the DZone community and get the full member experience.
Join For FreeThe csv module in Python 2.7 is more-or-less hard-wired to work with ASCII and only ASCII.
Sadly, we're often confronted with CSV files that include Unicode characters. There are numerous Stack Overflow questions on this topic. http://stackoverflow.com/search?q=python+csv+unicode
What to do? Since csv is married to seeing ASCII/bytes, we must explicitly decode the column values.
One solution is to wrap csv.DictReader, something like the following. We need to decode each individual column before attempting to do anything with value.
This new object is an iterable which contains a DictReader. We could subclass DictReader, also.
The use case, then, becomes something simple like this.
We can now get Unicode characters from a CSV file.
Source: http://slott-softwarearchitect.blogspot.com/2012/01/python-27-csv-files-with-unicode.html
Sadly, we're often confronted with CSV files that include Unicode characters. There are numerous Stack Overflow questions on this topic. http://stackoverflow.com/search?q=python+csv+unicode
What to do? Since csv is married to seeing ASCII/bytes, we must explicitly decode the column values.
One solution is to wrap csv.DictReader, something like the following. We need to decode each individual column before attempting to do anything with value.
class UnicodeDictReader( object ): def __init__( self, *args, **kw ): self.encoding= kw.pop('encoding', 'mac_roman') self.reader= csv.DictReader( *args, **kw ) def __iter__( self ): decode= codecs.getdecoder( self.encoding ) for row in self.reader: t= dict( (k,decode(row[k])[0]) for k in row ) yield t
This new object is an iterable which contains a DictReader. We could subclass DictReader, also.
The use case, then, becomes something simple like this.
with open("some.csv","rU") as source: rdr= UnicodeDictReader( source ) for row in rdr: # process the row
We can now get Unicode characters from a CSV file.
Source: http://slott-softwarearchitect.blogspot.com/2012/01/python-27-csv-files-with-unicode.html
CSV
Python (language)
Opinions expressed by DZone contributors are their own.
Comments