Over a million developers have joined DZone.

Convert Unicode Codepoints To UTF-8 Characters With Module#const_missing

DZone's Guide to

Convert Unicode Codepoints To UTF-8 Characters With Module#const_missing

Free Resource
From: http://www.davidflanagan.com/blog/2007_08.html#000136
Author: David Flanagan

# This module lazily defines constants of the form Uxxxx for all Unicode
# codepoints from U0000 to U10FFFF. The value of each constant is the
# UTF-8 string for the codepoint.
# Examples:
#   copyright = Unicode::U00A9
#   euro = Unicode::U20AC
#   infinity = Unicode::U221E
module Unicode
  def self.const_missing(name)  
    # Check that the constant name is of the right form: U0000 to U10FFFF
    if name.to_s =~ /^U([0-9a-fA-F]{4,5}|10[0-9a-fA-F]{4})$/
      # Convert the codepoint to an immutable UTF-8 string,
      # define a real constant for that value and return the value
      #p name, name.class
      const_set(name, [$1.to_i(16)].pack("U").freeze)
    else  # Raise an error for constants that are not Unicode.
      raise NameError, "Uninitialized constant: Unicode::#{name}"

puts copyright = Unicode::U00A9
puts euro = Unicode::U20AC
puts euro = Unicode::U20AC
puts infinity = Unicode::U221E
puts Unicode.const_get(:U221E)
p Unicode.constants
puts Unicode.constants
Unicode.constants.each { |u| puts Unicode.const_get(u) }


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}