Oooh, I hate character sets. Specifically that there are more than one of them. Here is a Ruby version of a Python script I found to convert cp1252 (aka windows-1252) into utf-8.
def clean_up dirty_text newstr = "" dirty_text.length.times do |i| character = dirty_text[i] newstr += if character < 0x80 character.chr elsif character < 0xC0 "\xC2" + character.chr else "\xC3" + (character - 64).chr end end newstr end
The original Python script was (http://miscoranda.com/96):
#!/usr/bin/python import sys for c in sys.stdin.read(): if ord(c) < 0x80: sys.stdout.write(c) elif ord(c) < 0xC0: sys.stdout.write('\xC2' + c) else: sys.stdout.write('\xC3' + chr(ord(c) - 64))