Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Stripping Out a Non-Breaking Space Character in Ruby

DZone's Guide to

Stripping Out a Non-Breaking Space Character in Ruby

· DevOps Zone ·
Free Resource

Learn more about how CareerBuilder was able to resolve customer issues 5x faster by using Scalyr, the fastest log management tool on the market. 

A couple of days ago I was playing with some code to scrape data from a web page and I wanted to skip a row in a table if the row didn’t contain any text.

I initially had the following code to do that:

rows.each do |row|
  next if row.strip.empty?
  # other scraping code
end

Unfortunately that approach broke down fairly quickly because empty rows contained a non breaking spacei.e. ‘ ’.

If we try called strip on a string containing that character we can see that it doesn’t get stripped:

# its hex representation is A0
> "\u00A0".strip
=> " "
> "\u00A0".strip.empty?
=> false

I wanted to see whether I could use gsub to solve the problem so I tried the following code which didn’t help either:

> "\u00A0".gsub(/\s*/, "")
=> " "
> "\u00A0".gsub(/\s*/, "").empty?
=> false

A bit of googling led me to this Stack Overflow post which suggests using the POSIX space character class to match the non breaking space rather than ‘\s’ because that will match more of the different space characters.

e.g.

> "\u00A0".gsub(/[[:space:]]+/, "")
=> ""
> "\u00A0".gsub(/[[:space:]]+/, "").empty?
=> true

So that we don’t end up indiscriminately removing all spaces to avoid problems like this where we mash the two names together…

> "Mark Needham".gsub(/[[:space:]]+/, "")
=> "MarkNeedham"

…the poster suggested the following regex which does the job:

> "\u00A0".gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
=> ""
> ("Mark" + "\u00A0" + "Needham").gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
=> "Mark Needham"
  • \A matches the beginning of the string
  • \z matches the end of the string

So what this bit of code does is match all the spaces that appear at the beginning or end of the string and then replaces them with ”.

Find out more about how Scalyr built a proprietary database that does not use text indexing for their log management tool.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}