DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Trending

  • AWS to Azure Migration: A Cloudy Journey of Challenges and Triumphs
  • Rust, WASM, and Edge: Next-Level Performance
  • The Role of AI in Identity and Access Management for Organizations
  • Understanding IEEE 802.11(Wi-Fi) Encryption and Authentication: Write Your Own Custom Packet Sniffer
  1. DZone
  2. Data Engineering
  3. Databases
  4. Sampling A Neo4j Database

Sampling A Neo4j Database

By 
Michael Hunger user avatar
Michael Hunger
·
Apr. 21, 14 · Interview
Likes (0)
Comment
Save
Tweet
Share
3.4K Views

Join the DZone community and get the full member experience.

Join For Free

After reading the interesting blog post of my colleague Rik van Bruggen on “Media, Politics and Graphs” I thought it would be really cool to render it as a GrapGist. Especially, as he already shared all the queries as a GitHub Gist.

netwerk

Unfortunately the dataset was a bit large for a sensible GraphGist representation, so I thought about means of extracting a smaller sample of his raw data that he made available (see his blog post for the link).

Considering my last blog post on creating data from sampling a cross product, this should be much easier. We know we want to have all nodes with the labels PARTY, SHOW and GENDER in our graph as well as a sample of GUEST nodes with their relationships.

The first part is easy:

MATCH (n)
WHERE n:PARTY OR n:SHOW OR n:GENDER
RETURN n;

The second part uses something that was not helpful in my last exploration, namely that random sampling when applied directly to a match, is used to filter the first node-pattern in the match and then still traverse all relationships/paths emanating from that node.

MATCH(n:GUEST)-[r]->()
WHERE rand() < 0.1
RETURN n,r;

The number you compare rand() to is the percentage you want to get back, in this example 10%.

Now I have two nice queries, that can get me the data, how can I bring them together? With UNION ALL

MATCH (n)
WHERE n:PARTY OR n:SHOW OR n:GENDER
RETURN n, null as r
UNION ALL
MATCH(n:GUEST)-[r]->()
WHERE rand() < 0.1
RETURN n,r;

And where do I get the Cypher statements from, that I can use to populate my GraphGist database setup? Fortunately my dump command made it into the Neo4j-Shell, so that we can just run it on the command-line and redirect the output into a file:

bin/neo4j-shell -path talkshow/graph.db \
-c 'dump
MATCH (n) WHERE n:PARTY OR n:SHOW OR n:GENDER RETURN n, null as r
UNION ALL
MATCH(n:GUEST)-[r]->() WHERE rand() < 0.1 RETURN n,r;' \
> talkshow/sample.cql

Don’t forget the semicolon at the end! Looking at sample.cql we see something like:

begin
create (_0:`SHOW` {`Modularity Name`:"B&vD", `id`:"B&vD", `label`:"B&vD", `modularity_class`:3, `weighted outdegree`:0.000000})
create (_1:`SHOW` {`Modularity Name`:"P&W", `id`:"P&W", `label`:"P&W", `modularity_class`:4, `weighted outdegree`:0.000000})
create (_2:`SHOW` {`Modularity Name`:"DWDD", `id`:"DWDD", `label`:"DWDD", `modularity_class`:5, `weighted outdegree`:0.000000})
...
...
create _509-[:`VISITED` {`quantity`:1}]->_5
create _509-[:`VISITED` {`quantity`:1}]->_2
create _509-[:`VISITED` {`quantity`:1}]->_1
create _509-[:`VISITED` {`quantity`:1}]->_0
;
commit

Which we can now use to populate our database for our GraphGist, and here it is in all its beauty – GraphGist: “Media, Politics and Graphs”. But actually I chose not to use Rik’s GitHub Gist with the queries, but to copy the nice text and pictures from his blog post into the GraphGist.

You might notice that some of the parties go without connections. That would need some tweaking of the sampling which I leave as exercise for you.

Have fun

Michael

Database Neo4j

Published at DZone with permission of , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!