DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Schema Change Management Tools: A Practical Overview
  • See What's New in Neo4j 4.0
  • NoSQL for Relational Minds
  • The Beginner's Guide To Understanding Graph Databases

Trending

  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Automatic Code Transformation With OpenRewrite
  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  • How to Convert XLS to XLSX in Java
  1. DZone
  2. Data Engineering
  3. Databases
  4. Neo4J and Virtual Nodes/Relationships

Neo4J and Virtual Nodes/Relationships

Let's take a look at Neo4j and virtual nodes and relationships.

By 
Scott Sosna user avatar
Scott Sosna
DZone Core CORE ·
Oct. 01, 19 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
14.1K Views

Join the DZone community and get the full member experience.

Join For Free

Image title

Neo4j and virtual nodes

Overview

It's often the case that the database schema used for loading data doesn't translate well for query or reporting, such as generating aggregate or summary reports from the source-of-truth transactional data. Using relational databases, a common solution to define a schema better aligned to the reporting requirements. The data is then extracted from the transactional database, transformed, and loaded into the new schema, also known as ETL.

Schemas created for Neo4J and other NoSQL databases may also present reporting challenges that can be solved in different ways. For Neo4J, virtual nodes and relationships are a means of transforming the data in place without creating a separate schema or data store.

You might like:  Querying Graphs with Neo4j

Context

The data set used in this article comes from federally-mandated filings for political lobbying in the United States. For more background, refer to Loading US Lobbying Data into Neo4J and Analyzing US Lobbying Data in Neo4J. The source project can be found in GitHub.

The data schema used is below:

Data schemaDefinitions

  • Filing: Represents a single lobbying effort, specifying the detail information about the filing — i.e., unique identifier, period represent, dollar amount spent, date of filing, detailed description — and has relationships to additional details.
  • Client: Special interest groups — e.g., corporations, non-profits, industries, national and international governments — advocating for/against legislation or regulations under consideration by the federal government.
  • Lobbyist: A professional hired by the client to present the client's position and persuade the federal government to take the client's position with regards to proposed legislation and regulations.
  • Registrant: The organization performing lobbying activities on behalf of the client, registered with the US government. Clients may lobby on their own behalf as both client and registrant or may hire firms who specialize in lobbying and hire lobbyists.
  • Government Entity: A department, regulatory agency, commission, or branch of government lobbied. Multiple entities are usually associated with a single filing; by far the most lobbied entities are the legislative branches, the Senate and House of Representatives.
  • Issue: Filings are assigned to general categories to simplify reporting, such as Education, Transportation, and Natural Resources. The specifics of the lobbying effort is described with each filing.

Problem: Visualizing the Raw Data

Let's explore the loaded data set, starting with a registrant and branching out. You'll see how cluttered the browser becomes as we navigate relationships by expanding nodes.

Step 1

MATCH a single (:Registrant).

MATCH (r:Registrant) RETURN r LIMIT 1

Single RegistrantStep 2

Double-click on the (:Registrant) node to display related nodes.

Expand Registrant

Step 3

Expand a single (:Filing) node.

Image title

Step 4

Expand a (:Government Entity) node.

Image title

Step 5

Expand another (:GovernmentEntity) node.

Expand Second Filing

(:Filing) nodes quickly overwhelm the graph, as they are the unifying node type related to all other node types,; unlike the other node types in this schema, all Filings are singletons. Any worthwhile Cypher query — whether used for visualization or returning tablular data — must include (:Filing), adding complexity and noise to any results.

Solution: Virtual Nodes and Relationships

Definition

Virtual nodes and relationships are created using the Neo4J APOC library in a Cypher statement. Unlike nodes and relationships created and stored in Neo4J, virtual nodes and relationships are transitory and only exist during query execution.

After installing the APOC library, the Neo4J config file is changed to provide the nodes functions unrestricted security.

dbms.security.procedures.unrestricted=apoc.nodes.*

Usage

The functions for creating virtual nodes and relationships are fairly simple.

WITH apoc.create.vNode(['vnode'], {name:'one'}) AS one, 
     apoc.create.vNode(['vnode'], {name:'other'}) AS other
RETURN one, 
       other,
       apoc.create.vRelationship(one, 'related', {name:'one-to-other relationship'}, other) as vrel

Image title

Both virtual nodes and virtual relationships may have properties, if needed, supplied during creation using the JSON syntax.

You can differentiate persisted and virtual nodes/relationships by the Neo4J-generated IDs, which are negative for virtual.

Example 1: What Government Entities Does a Registrant Lobby?

The question is whether registrants target specific government entities in their lobbying efforts. Individual filings aren't important but rather we want to know how many filings and the dollar amount of those filings. To do this, we can create virtual nodes for the registrants and government entities and a virtual relationship between the two.

Solution #1

The following is the complete Cypher command, which I'll break out into explainable chunks.

MATCH (r:Registrant)-[:FILED]->(f:Filing)-[:TARGETED_AT]->(g:GovernmentEntity)
WITH r, f, g, SUM(f.amount) AS amt,
     apoc.date.fields(LEFT(f.receivedOn, 10), 'yyyy-MM-dd') AS received
WHERE received.years = 2018 AND 
      received.months = 3 AND
      amt > 100000 AND
      g.name <> 'SENATE' AND 
      g.name <> 'HOUSE OF REPRESENTATIVES'
WITH COLLECT(DISTINCT r.name) AS registrants,
     COLLECT(DISTINCT g.name) AS gents
WITH [gname IN gents | apoc.create.vNode(['gent'],{name:gname})] AS gNodes,
     [rname in registrants | 
       apoc.create.vNode(['Registrant'],{name:rname})] AS rNodes
WITH apoc.map.groupBy(gNodes, 'name') AS gvs,
     apoc.map.groupBy(rNodes, 'name') AS rvs
MATCH (r:Registrant)-[:FILED]->(f:Filing)-[:TARGETED_AT]->(g:GovernmentEntity)
WITH gvs, rvs, r, f, g, SUM (f.amount) AS amt,
     apoc.date.fields(LEFT(f.receivedOn, 10), 'yyyy-MM-dd') AS received
WHERE received.years = 2018 AND 
      received.months = 3 AND
      amt > 100000 AND
      g.name <> 'SENATE' AND 
      g.name <> 'HOUSE OF REPRESENTATIVES'
RETURN rvs,
       gvs,
       apoc.create.vRelationship (rvs[r.name], 'LOBBIED',
          {filingCnt:COUNT(f), filingAmt:SUM(f.amount)}, 
          gvs[g.name]) AS rel

Part 1

Identify and filter the persisted data of interest, in this example, filings from March 2018 over $100,000, ignoring the legislative branch since the vast majority of filings include either the House, Senate, or both.

The names of registrants and government entities matched are collected into a list to allow iterating later.

MATCH (r:Registrant)-[:FILED]->(f:Filing)-[:TARGETED_AT]->(g:GovernmentEntity)
WITH r, f, g, SUM(f.amount) AS amt,
     apoc.date.fields(LEFT(f.receivedOn, 10), 'yyyy-MM-dd') AS received
WHERE received.years = 2018 AND 
      received.months = 3 AND
      amt > 100000 AND
      g.name <> 'SENATE' AND 
      g.name <> 'HOUSE OF REPRESENTATIVES'
WITH COLLECT(DISTINCT r.name) AS registrants,
     COLLECT(DISTINCT g.name) AS gents

Part 2

Next, iterate through the names and create the appropriate virtual node. A map is created for the nodes created, using the name property as the key into the map.

WITH [gname IN gents | apoc.create.vNode(['gent'],{name:gname})] AS gNodes,
     [rname in registrants | 
       apoc.create.vNode(['Registrant'],{name:rname})] AS rNodes
WITH apoc.map.groupBy(gNodes, 'name') AS gvs,
     apoc.map.groupBy(rNodes, 'name') AS rvs

Part 3

Re-query the nodes from which the virtual nodes were created.

MATCH (r:Registrant)-[:FILED]->(f:Filing)-[:TARGETED_AT]->(g:GovernmentEntity)
WITH gvs, rvs, r, f, g, SUM (f.amount) AS amt,
     apoc.date.fields(LEFT(f.receivedOn, 10), 'yyyy-MM-dd') AS received
WHERE received.years = 2018 AND 
      received.months = 3 AND
      amt > 100000 AND
      g.name <> 'SENATE' AND 
      g.name <> 'HOUSE OF REPRESENTATIVES'

Part 4

Create a virtual relationship between the registrant and government entity directly, adding properties which aggregate the filings in a useful way.

RETURN rvs,
       gvs,
       apoc.create.vRelationship (rvs[r.name], 'LOBBIED',
          {filingCnt:COUNT(f), filingAmt:SUM(f.amount)}, 
          gvs[g.name]) AS rel

Visualization

The results are much easier to understand when the explicit filings are removed and aggregates are included as properties on the [:LOBBIED] relationships. In the Neo4J browser, select a relationship to see the total number of filings and dollar amount spent by the registrant.Image title

Solution #2

Virtual nodes are useful when the persisted nodes are transformed into something more useful than the base node. Solution #1 created them to demonstate how, but they aren't actually required since virtual relationships can connected persisted nodes.

The following Cypher gets the same results without creating virtual nodes.

MATCH (r:Registrant)-[:FILED]->(f:Filing)-[:TARGETED_AT]->(g:GovernmentEntity)
WITH r, f, g, SUM(f.amount) AS amt,
     apoc.date.fields(LEFT(f.receivedOn, 10), 'yyyy-MM-dd') AS received
WHERE received.years = 2018 AND
      received.months = 3 AND
      amt > 100000 AND
      g.name <> 'SENATE' AND
      g.name <> 'HOUSE OF REPRESENTATIVES'
RETURN r, g,
       apoc.create.vRelationship (r, 'LOBBIED',
          {filingCnt:COUNT(f), filingAmt:SUM(f.amount)}, g) AS rel

Conclusion

Transforming Neo4J schema inline with virtual nodes and relationships provides different insights into your data than what was available with the original, persisted data. While this article focused on simplified visualizations, transformed tabular results can be generated using Cypher's capabilities to chain queries together (the WITH clause) and including the virtual nodes/relationships in the chain.

Further Reading

Querying Neo4j Clusters

Neo4j Relational database Database Data (computing) Schema

Opinions expressed by DZone contributors are their own.

Related

  • Schema Change Management Tools: A Practical Overview
  • See What's New in Neo4j 4.0
  • NoSQL for Relational Minds
  • The Beginner's Guide To Understanding Graph Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!