DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Parsing UTF-8 Encoded Strings In Ruby

  • submit to reddit
        Instead of using $KCODE = 'UTF8' together with require 'jcode' you can use the /u regex parameter
to parse UTF-8 strings containing multibyte characters.

A Latin1 <-> UTF-8 conversion hack btw can be found here:

For comparison just drop the u option!

string = "abc\303\244"  #  \303\244 stands for ä

puts string.scan(/./u).size

puts string.split(//u).reverse.join

puts string.gsub(/.$/u, '')

regex =
md = regex.match(string)
puts md[0].inspect