Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j: Facts As Nodes

DZone's Guide to

Neo4j: Facts As Nodes

Here are my thoughts on the "facts as nodes" modeling pattern with examples provided.

· Database Zone
Free Resource

Read why times series is the fastest growing database category.

Recently, I spoke at the Neo4j London user group about incrementally building a recommendation engine and described the "facts as nodes" modeling pattern, defined as follows in the Graph Databases book:

When two or more domain entities interact for a period of time, a fact emerges. We represent a fact as a separate node with connections to each of the entities engaged in that fact.

Modeling an action in terms of its product—that is, in terms of the thing that results from the action—produces a similar structure: an intermediate node that represents the outcome of an interaction between two or more entities.

We started with the following model describing a meetup member and the groups they’ve joined:

2015 12 04 07 26 11

This model works well for the query it was defined for–find groups similar to ones that I’m already a member of:

MATCH (member:Member {name: "Mark Needham"})-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) 
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC

Prefixing that query with the "PROFILE" keyword yields a query plan and the following summary text:

Cypher version: CYPHER 2.3, planner: COST. 89100 total db hits in 113 ms.

In this model it feels like there is a membership fact waiting to become a node.

2015 12 04 07 35 38

We can refactor towards that model with the following query:

MATCH (member:Member)-[rel:MEMBER_OF]->(group)

MERGE (membership:Membership {id: member.id + "_" + group.id})
SET membership.joined = rel.joined

MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group);

We’d answer our initial question with the following query:

MATCH (member:Member {name: "Mark Needham"})-[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) 
WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC

And, at the following cost:

Cypher version: CYPHER 2.3, planner: COST. 468201 total db hits in 346 ms.

The membership node hasn’t proved its value yet; it does 4x more work to get the same result. However, the next question we want to answer is "What group do people join after the Neo4j user group?" where it might come in handy.

First, we’ll add a "NEXT" relationship between a user’s adjacent group memberships by writing the following query:

MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)

WITH member, membership ORDER BY member.id, membership.joined

WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx

WITH memberships[idx] AS m1, memberships[idx+1] AS m2
MERGE (m1)-[:NEXT]->(m2);

And, now for the query:

MATCH (group:Group {name: "Neo4j - London User Group"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),         
      (membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership),
      (nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
Cypher version: CYPHER 2.3, planner: COST. 23671 total db hits in 39 ms.

And for comparison–the same query using the initial model:

MATCH (group:Group {name: "Neo4j - London User Group"})<-[membership:MEMBER_OF]-(member),
      (member)-[otherMembership:MEMBER_OF]->(otherGroup)
WHERE membership.joined < otherMembership.joined
WITH member, otherGroup 
ORDER BY otherMembership.joined
WITH member, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
Cypher version: CYPHER 2.3, planner: COST. 86179 total db hits in 138 ms.

This time the membership model does 3x less work, so depending on the question, a different model works better.

Given this observation, we might choose to keep both models. The disadvantage of doing that is that we pay write and maintenance penalties to keep them both in sync. For example, this is what queries to add a new membership or remove one would look like

Adding Group Membership

WITH "Mark Needham" AS memberName, 
     "Neo4j - London User Group" AS groupName,
     timestamp() AS now

MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})

MERGE (member)-[memberOfRel:MEMBER_OF]->(group)
ON CREATE SET memberOfRel.time = now

MERGE (membership:Membership {id: member.id + "_" + group.id})
ON CREATE SET membership.joined = now
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)

Removing Group Membership

WITH "Mark Needham" AS memberName, 
     "Neo4j - London User Group" AS groupName,
     timestamp() AS now

MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})

MATCH (member)-[memberOfRel:MEMBER_OF]->(group)

MATCH (membership:Membership {id: member.id + "_" + group.id})
MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership)
MATCH (membership)-[ofGroupRel:OF_GROUP]->(group)

DELETE memberOfRel, hasMembershipRel, ofGroupRel

The dataset is on github so take a look at it and send any questions my way.

Learn how to get 20x more performance than Elastic by moving to a Time Series database.

Topics:
graph databases ,database ,neo4j

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}