Over a million developers have joined DZone.

Stripping Out a Non-Breaking Space Character in Ruby

· DevOps Zone

The DevOps zone is brought to you in partnership with Sonatype Nexus. The Nexus suite helps scale your DevOps delivery with continuous component intelligence integrated into development tools, including Eclipse, IntelliJ, Jenkins, Bamboo, SonarQube and more. Schedule a demo today

A couple of days ago I was playing with some code to scrape data from a web page and I wanted to skip a row in a table if the row didn’t contain any text.

I initially had the following code to do that:

rows.each do |row|
  next if row.strip.empty?
  # other scraping code
end

Unfortunately that approach broke down fairly quickly because empty rows contained a non breaking spacei.e. ‘ ’.

If we try called strip on a string containing that character we can see that it doesn’t get stripped:

# its hex representation is A0
> "\u00A0".strip
=> " "
> "\u00A0".strip.empty?
=> false

I wanted to see whether I could use gsub to solve the problem so I tried the following code which didn’t help either:

> "\u00A0".gsub(/\s*/, "")
=> " "
> "\u00A0".gsub(/\s*/, "").empty?
=> false

A bit of googling led me to this Stack Overflow post which suggests using the POSIX space character class to match the non breaking space rather than ‘\s’ because that will match more of the different space characters.

e.g.

> "\u00A0".gsub(/[[:space:]]+/, "")
=> ""
> "\u00A0".gsub(/[[:space:]]+/, "").empty?
=> true

So that we don’t end up indiscriminately removing all spaces to avoid problems like this where we mash the two names together…

> "Mark Needham".gsub(/[[:space:]]+/, "")
=> "MarkNeedham"

…the poster suggested the following regex which does the job:

> "\u00A0".gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
=> ""
> ("Mark" + "\u00A0" + "Needham").gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
=> "Mark Needham"
  • \A matches the beginning of the string
  • \z matches the end of the string

So what this bit of code does is match all the spaces that appear at the beginning or end of the string and then replaces them with ”.

The DevOps zone is brought to you in partnership with Sonatype Nexus. Use the Nexus Suite to automate your software supply chain and ensure you're using the highest quality open source components at every step of the development lifecycle. Get Nexus today

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}