Over a million developers have joined DZone.

Parsing UTF-8 Encoded Strings In Ruby

·
Instead of using $KCODE = 'UTF8' together with require 'jcode' you can use the /u regex parameter
to parse UTF-8 strings containing multibyte characters.

A Latin1 <-> UTF-8 conversion hack btw can be found here: 
http://rubyforge.org/pipermail/fxruby-users/2005-September/000480.html

For comparison just drop the u option!



string = "abc\303\244"  #  \303\244 stands for ä

puts string.scan(/./u).size

puts string.split(//u).reverse.join

puts string.gsub(/.$/u, '')

regex = Regexp.new(/..../u)
md = regex.match(string)
puts md[0].inspect


Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}