Over a million developers have joined DZone.

Knowledge Bases in Neo4j

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.


From the second we are born we are collecting a wealth of knowledge about the world. This knowledge is accumulated and interrelated inside our brains and it represents what we know. If we could export this knowledge and give it to a computer, it would look like ConceptNet. ConceptNet is a semantic network that…

…is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals.

I wrote a little ruby script to import ConceptNet5 into Neo4j and it gives us a nice graph (243MB) to work with. ConceptNet5 as presented in csv files is actually a hypergraph, with a reason for the concept:

/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/PJ    /e/e529e3a070783cbbe212bc5e721b6938c0a6df6b     /d/conceptnet/4/en      [[Old maps]] are not [[very accurate]]
/a/[/r/NotHasProperty/,/c/en/old_map/,/c/en/very_accurate/]     /r/NotHasProperty       /c/en/old_map   /c/en/very_accurate     /ctx/all
        -1      /s/activity/omcs/vote,/s/contributor/omcs/aghanford     /e/a8ecaed55f5ffba88b6d02da99ecf3fe42bffe55     /d/conceptnet/4/en
      [[Old maps]] are not [[very accurate]]

Here two contributors let us know that old maps are not very accurate. That’s great to know, but we don’t really need to represent this twice in our graph. So instead we capture and ignore duplicate relationships by using a bloom filter to check for their existence.

@edge_bf = BloomFilter::Native.new(:size => 212000000, :hashes => 23, :bucket => 8, :raise => false)
def is_unique_rel(from,to,rel)
  return false if @edge_bf.include?("#{from}-#{to}-#{rel}")

Once it’s all set and done, we end up with about 2.5 million nodes and 7.5 million relationships:


For example, let’s see everything ConceptNet5 knows about Sushi:


START sushi=node:Concepts(id="/c/en/sushi")
MATCH sushi-[r]-other_concepts
RETURN sushi.id, TYPE(r), other_concepts.id

We imported all of the concepts in to a “Concepts” index to make the graph easy to work with.
Here we are asking for all other concepts connected to the sushi concept, and asking the graph to tell us what type of relationship exists between them.

==> +--------------------------------------------------------+
==> | TYPE(r)           | other_concepts.id                  |
==> +--------------------------------------------------------+
==> | "MadeOf"          | "/c/en/raw_fish"                   |
==> | "MotivatedByGoal" | "/c/en/eat_in_restaurant"          |
==> | "AtLocation"      | "/c/en/japanese_restaurant"        |
==> | "HasProperty"     | "/c/en/delicious"                  |
==> | "HasProperty"     | "/c/en/japanese_in_origin"         |
==> | "IsA"             | "/c/en/asian_food"                 |
==> | "IsA"             | "/c/en/from_japan"                 |
==> | "IsA"             | "/c/en/japanese_food"              |
==> | "IsA"             | "/c/en/food"                       |
==> | "IsA"             | "/c/en/fish"                       |
==> | "NotIsA"          | "/c/en/raw_fish"                   |
==> | "CapableOf"       | "/c/en/consist_mainly_of_raw_fish" |
==> | "ReceivesAction"  | "/c/en/eat_by_many_westerner"      |
==> +--------------------------------------------------------+

The results are quite interesting. Our graph knows it’s made of raw fish, eaten in a restaurant, specifically a Japanese restaurant (hard to find sushi at an Italian or Indian restaurant). The graph thinks sushi is delicious (I would agree, but some folks would violently disagree). Notice also that it has a link to “NotIsA” raw_fish and a link to “consists_mainly_of_raw_fish”, so our graph is smart enough to know that some sushi is not raw.

If you ever happen to stop by the Neo4j office in San Mateo, CA, you’ll want to go to Sushi Sams for the best Sushi in San Mateo. Let’s see what else it thinks is delicious:

START delicious=node:Concepts(id="/c/en/delicious")
MATCH delicious-[r]-other_concepts
RETURN TYPE(r), other_concepts.id

==> +------------------------------------+
==> | TYPE(r)       | other_concepts.id  |
==> +------------------------------------+
==> | "IsA"         | "/c/en/single"     |
==> | "NotIsA"      | "/c/en/nutricious" |
==> | "HasProperty" | "/c/en/ice_cream"  |
==> | "HasProperty" | "/c/en/atangerine" |
==> | "HasProperty" | "/c/en/banana"     |
==> | "HasProperty" | "/c/en/chicken"    |
==> | "HasProperty" | "/c/en/chocolate"  |
==> | "HasProperty" | "/c/en/beef"       |
==> | "HasProperty" | "/c/en/fruit"      |
==> | "HasProperty" | "/c/en/butter"     |
==> | "HasProperty" | "/c/en/meat"       |
==> | "HasProperty" | "/c/en/cake"       |
==> | "HasProperty" | "/c/en/sushi"      |
==> | "HasProperty" | "/c/en/marmite"    |
==> | "HasProperty" | "/c/en/cheese"     |
==> | "HasProperty" | "/c/en/tortilla"   |
==> +------------------------------------+

Anything that is not “nutricious” (they probably meant nutritious ) is not delicious. I agree with most other things on here… but marmite? Seriously.


If you want to tackle something a bit bigger, you can look at the Yago Knowledge Base which has 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.


Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.


Published at DZone with permission of Max De Marzi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}