Handling Accented Characters With Python Regular Expressions
Join the DZone community and get the full member experience.
Join For Free[A-z] just isn't good enough!
import re
string = 'riché'
print string
riché
richre = re.compile('([A-z]+)')
match = richre.match(string)
print match.groups()
('rich',)
richre = re.compile('(\w+)',re.LOCALE)
match = richre.match(string)
print match.groups()
('rich',)
richre = re.compile('([é\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)
richre = re.compile('([\xe9\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)
richre = re.compile('([\xe9-\xf8\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)
string = 'richéñ'
match = richre.match(string)
print match.groups()
('rich\xe9\xf1',)
richre = re.compile('([\u00E9-\u00F8\w]+)')
print match.groups()
('rich\xe9\xf1',)
matched = match.group(1)
print matched
richéñ
Python (language)
Opinions expressed by DZone contributors are their own.
Comments