Platinum Partner
news,architects,nosql,database,architecture,neo4j

Knowledge Bases in Neo4j


cnet5promo

From the second we are born we are collecting a wealth of knowledge about the world. This knowledge is accumulated and interrelated inside our brains and it represents what we know. If we could export this knowledge and give it to a computer, it would look like ConceptNet. ConceptNet is a semantic network that…

…is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals.


I wrote a little ruby script to import ConceptNet5 into Neo4j and it gives us a nice graph (243MB) to work with. ConceptNet5 as presented in csv files is actually a hypergraph, with a reason for the concept:

/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/PJ    /e/e529e3a070783cbbe212bc5e721b6938c0a6df6b     /d/conceptnet/4/en      [[Old maps]] are not [[very accurate]]
/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/aghanford     /e/a8ecaed55f5ffba88b6d02da99ecf3fe42bffe55     /d/conceptnet/4/en
      [[Old maps]] are not [[very accurate]]

Here two contributors let us know that old maps are not very accurate. That’s great to know, but we don’t really need to represent this twice in our graph. So instead we capture and ignore duplicate relationships by using a bloom filter to check for their existence.

@edge_bf = BloomFilter::Native.new(:size => 212000000, :hashes => 23, :bucket => 8, :raise => false)
 
def is_unique_rel(from,to,rel)
  return false if @edge_bf.include?("#{from}-#{to}-#{rel}")
  @edge_bf.insert("#{from}-#{to}-#{rel}")
  true
end

Once it’s all set and done, we end up with about 2.5 million nodes and 7.5 million relationships:

conceptnet5

For example, let’s see everything ConceptNet5 knows about Sushi:

sushi

START sushi=node:Concepts(id="/c/en/sushi")
MATCH sushi-[r]-other_concepts
RETURN sushi.id, TYPE(r), other_concepts.id

We imported all of the concepts in to a “Concepts” index to make the graph easy to work with.
Here we are asking for all other concepts connected to the sushi concept, and asking the graph to tell us what type of relationship exists between them.

==> +--------------------------------------------------------+
==> | TYPE(r)           | other_concepts.id                  |
==> +--------------------------------------------------------+
==> | "MadeOf"          | "/c/en/raw_fish"                   |
==> | "MotivatedByGoal" | "/c/en/eat_in_restaurant"          |
==> | "AtLocation"      | "/c/en/japanese_restaurant"        |
==> | "HasProperty"     | "/c/en/delicious"                  |
==> | "HasProperty"     | "/c/en/japanese_in_origin"         |
==> | "IsA"             | "/c/en/asian_food"                 |
==> | "IsA"             | "/c/en/from_japan"                 |
==> | "IsA"             | "/c/en/japanese_food"              |
==> | "IsA"             | "/c/en/food"                       |
==> | "IsA"             | "/c/en/fish"                       |
==> | "NotIsA"          | "/c/en/raw_fish"                   |
==> | "CapableOf"       | "/c/en/consist_mainly_of_raw_fish" |
==> | "ReceivesAction"  | "/c/en/eat_by_many_westerner"      |
==> +--------------------------------------------------------+

The results are quite interesting. Our graph knows it’s made of raw fish, eaten in a restaurant, specifically a Japanese restaurant (hard to find sushi at an Italian or Indian restaurant). The graph thinks sushi is delicious (I would agree, but some folks would violently disagree). Notice also that it has a link to “NotIsA” raw_fish and a link to “consists_mainly_of_raw_fish”, so our graph is smart enough to know that some sushi is not raw.

If you ever happen to stop by the Neo4j office in San Mateo, CA, you’ll want to go to Sushi Sams for the best Sushi in San Mateo. Let’s see what else it thinks is delicious:

START delicious=node:Concepts(id="/c/en/delicious")
MATCH delicious-[r]-other_concepts
RETURN TYPE(r), other_concepts.id

==> +------------------------------------+
==> | TYPE(r)       | other_concepts.id  |
==> +------------------------------------+
==> | "IsA"         | "/c/en/single"     |
==> | "NotIsA"      | "/c/en/nutricious" |
==> | "HasProperty" | "/c/en/ice_cream"  |
==> | "HasProperty" | "/c/en/atangerine" |
==> | "HasProperty" | "/c/en/banana"     |
==> | "HasProperty" | "/c/en/chicken"    |
==> | "HasProperty" | "/c/en/chocolate"  |
==> | "HasProperty" | "/c/en/beef"       |
==> | "HasProperty" | "/c/en/fruit"      |
==> | "HasProperty" | "/c/en/butter"     |
==> | "HasProperty" | "/c/en/meat"       |
==> | "HasProperty" | "/c/en/cake"       |
==> | "HasProperty" | "/c/en/sushi"      |
==> | "HasProperty" | "/c/en/marmite"    |
==> | "HasProperty" | "/c/en/cheese"     |
==> | "HasProperty" | "/c/en/tortilla"   |
==> +------------------------------------+

Anything that is not “nutricious” (they probably meant nutritious ) is not delicious. I agree with most other things on here… but marmite? Seriously.

marmite-404_685611c

If you want to tackle something a bit bigger, you can look at the Yago Knowledge Base which has 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.

yago_logo_mainpage


Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}