DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Convert Unicode Codepoints To UTF-8 Characters With Module#const_missing

09.15.2007
| 15903 views |
  • submit to reddit
        From: http://www.davidflanagan.com/blog/2007_08.html#000136
Author: David Flanagan


# This module lazily defines constants of the form Uxxxx for all Unicode
# codepoints from U0000 to U10FFFF. The value of each constant is the
# UTF-8 string for the codepoint.
# Examples:
#   copyright = Unicode::U00A9
#   euro = Unicode::U20AC
#   infinity = Unicode::U221E
#
module Unicode
  def self.const_missing(name)  
    # Check that the constant name is of the right form: U0000 to U10FFFF
    if name.to_s =~ /^U([0-9a-fA-F]{4,5}|10[0-9a-fA-F]{4})$/
      # Convert the codepoint to an immutable UTF-8 string,
      # define a real constant for that value and return the value
      #p name, name.class
      const_set(name, [$1.to_i(16)].pack("U").freeze)
    else  # Raise an error for constants that are not Unicode.
      raise NameError, "Uninitialized constant: Unicode::#{name}"
    end
  end
end


puts copyright = Unicode::U00A9
puts euro = Unicode::U20AC
puts euro = Unicode::U20AC
puts infinity = Unicode::U221E
puts Unicode.const_get(:U221E)
p Unicode.constants
puts Unicode.constants
Unicode.constants.each { |u| puts Unicode.const_get(u) }