Over a million developers have joined DZone.

Cleaning Strings With Regular Expressions

·
Firsly, to get rid of all non ascii characters.


=> text = "Normal ©®»λαβstring"
"Normal ©®»λαβstring"
=> stripped = text.chars.gsub(/[^\x20-\x7E]/, '')
"Normal string"


Now lets get rid of html tags.


# strip html tags
def strip_html(str, preserve_tags = ['p'])
  return '' unless str.is_a?(String)

  str = str.strip || ''
  preserve_el = preserve_tags.join('|') << '|\/'
  str.chars.gsub(/<(\/|\s)*[^(#{preserve_el})][^>]*>/,'')
end

=> text = "

This is a link and a span

" "

This is a link and a span

" => stripped = strip_html(text) "

This is a link and a span

" => stripped = strip_html(text, []) "This is a link and a span"
Finally, lets compact some whitespace to ensure that at most, one space remains between two words. => text = " This is some text with strange spacing patterns " " This is some text with strange spacing patterns " => stripped = text.chars.gsub(/\s{2,}/,'').strip "This is some text with strange spacing patterns"
Topics:

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}