DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Trending

  • Using LLMs to Automate Data Cleaning and Transformation Pipelines
  • When Snowflake Lies to You: Understanding False Failures in dbt Pipelines
  • Chaos Engineering Has a Blind Spot. Agentic AI Lives in It.
  • Spring AI Advisors: Chat Memory, Token Tracking, and Message Logging
  1. DZone
  2. Data Engineering
  3. Databases
  4. Sampling A Neo4j Database

Sampling A Neo4j Database

By 
Michael Hunger user avatar
Michael Hunger
·
Apr. 21, 14 · Interview
Likes (0)
Comment
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

After reading the interesting blog post of my colleague Rik van Bruggen on “Media, Politics and Graphs” I thought it would be really cool to render it as a GrapGist. Especially, as he already shared all the queries as a GitHub Gist.

netwerk

Unfortunately the dataset was a bit large for a sensible GraphGist representation, so I thought about means of extracting a smaller sample of his raw data that he made available (see his blog post for the link).

Considering my last blog post on creating data from sampling a cross product, this should be much easier. We know we want to have all nodes with the labels PARTY, SHOW and GENDER in our graph as well as a sample of GUEST nodes with their relationships.

The first part is easy:

MATCH (n)
WHERE n:PARTY OR n:SHOW OR n:GENDER
RETURN n;

The second part uses something that was not helpful in my last exploration, namely that random sampling when applied directly to a match, is used to filter the first node-pattern in the match and then still traverse all relationships/paths emanating from that node.

MATCH(n:GUEST)-[r]->()
WHERE rand() < 0.1
RETURN n,r;

The number you compare rand() to is the percentage you want to get back, in this example 10%.

Now I have two nice queries, that can get me the data, how can I bring them together? With UNION ALL

MATCH (n)
WHERE n:PARTY OR n:SHOW OR n:GENDER
RETURN n, null as r
UNION ALL
MATCH(n:GUEST)-[r]->()
WHERE rand() < 0.1
RETURN n,r;

And where do I get the Cypher statements from, that I can use to populate my GraphGist database setup? Fortunately my dump command made it into the Neo4j-Shell, so that we can just run it on the command-line and redirect the output into a file:

bin/neo4j-shell -path talkshow/graph.db \
-c 'dump
MATCH (n) WHERE n:PARTY OR n:SHOW OR n:GENDER RETURN n, null as r
UNION ALL
MATCH(n:GUEST)-[r]->() WHERE rand() < 0.1 RETURN n,r;' \
> talkshow/sample.cql

Don’t forget the semicolon at the end! Looking at sample.cql we see something like:

begin
create (_0:`SHOW` {`Modularity Name`:"B&vD", `id`:"B&vD", `label`:"B&vD", `modularity_class`:3, `weighted outdegree`:0.000000})
create (_1:`SHOW` {`Modularity Name`:"P&W", `id`:"P&W", `label`:"P&W", `modularity_class`:4, `weighted outdegree`:0.000000})
create (_2:`SHOW` {`Modularity Name`:"DWDD", `id`:"DWDD", `label`:"DWDD", `modularity_class`:5, `weighted outdegree`:0.000000})
...
...
create _509-[:`VISITED` {`quantity`:1}]->_5
create _509-[:`VISITED` {`quantity`:1}]->_2
create _509-[:`VISITED` {`quantity`:1}]->_1
create _509-[:`VISITED` {`quantity`:1}]->_0
;
commit

Which we can now use to populate our database for our GraphGist, and here it is in all its beauty – GraphGist: “Media, Politics and Graphs”. But actually I chose not to use Rik’s GitHub Gist with the queries, but to copy the nice text and pictures from his blog post into the GraphGist.

You might notice that some of the parties go without connections. That would need some tweaking of the sampling which I leave as exercise for you.

Have fun

Michael

Database Neo4j

Published at DZone with permission of . See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook