Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Visualizing Search Keyword Overlap

DZone's Guide to

Visualizing Search Keyword Overlap

Visualizing how often search keywords overlap when looking for pest control experts.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The other day someone asked me to look at the search data for Debug Pest Control, a pest management company based in Rhode Island. One of the things I wanted to visualize was how the search terms overlapped with each other.

To do this, I created a graph where the nodes are the keywords and edges join nodes that share a word. (A “keyword” is actually a set of words, whatever someone types into a search.) The result was a solid gray blob be cause the the nodes are so densely connected.

I thinned the graph by first only looking at keywords that have occurred at least twice. Then I made my connection criteria a little stronger. I take the union of the individual words in two keywords and create a connection if at least 1/3 of these words appear in both keywords.

(You can click on the image to see a larger version.)

The area of each red dot is proportional to the number of times someone has come to the site using that keyword. You can see that there’s long-tail behavior: while some dots are much larger than others, there are a lot of little dots.

In case you’re curious, these are the top keywords:

  1. pest control
  2. ri
  3. pest control ri
  4. pest control rhode island,
  5. plants that keep bees away
  6. poisonous snakes in ri
  7. plants that keep bees and wasps away
  8. plants to keep bees away
  9. plants that keep wasps away
  10. pest control companies 

The keywords were filtered a little before making the graph. Some of the top keywords contained “debug.” These may have been people using the term to refer to eliminating pests from their property, but more likely they were people who knew the company name. By removing “debug” from each of the keywords, the results focus on people searching for the company’s services rather than the company itself.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
visualization ,search engine

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}