DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Practical Generators in Go 1.23 for Database Pagination
  • Java and MongoDB Integration: A CRUD Tutorial [Video Tutorial]
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications

Trending

  • Building a Zero-Cost Approval Workflow With AWS Lambda Durable Functions
  • Alternative Structured Concurrency
  • RAG Is Not Enough: Advanced Retrieval Architectures Using Vertex AI Search on GCP
  • Mocking Kafka for Local Spring Development
  1. DZone
  2. Data Engineering
  3. Data
  4. Analyzing Relationships in Game of Thrones With NetworkX, Gephi, and Nebula Graph (Part Two)

Analyzing Relationships in Game of Thrones With NetworkX, Gephi, and Nebula Graph (Part Two)

In this post, we will show you how to access data in Nebula Graph by using NetworkX.

By 
Jamie Liu user avatar
Jamie Liu
·
Sep. 11, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
8.1K Views

Join the DZone community and get the full member experience.

Join For Free

In the last post, we showed the character relationship for the Game of Thrones by using NetworkX and Gephi. In this post, we will show you how to access data in Nebula Graph by using NetworkX.

NetworkX

NetworkX is a modeling tool for the graph theory and complex networks written by Python. With its rich, easy-to-use built-in graphs and analysis algorithms, it’s easy to perform complex network analysis and simulation modeling.

In NetworkX, a graph (network) is a collection of nodes together with a collection of edges. Attributes are often associated with nodes and/or edges and are optional. Nodes represent data. Edges are uniquely determined by two nodes, representing the relationship between the two nodes. Nodes and edges can also have many attributes so that more information can be stored.

The following four basic graph types are provided in NetworkX:

  • Graph: undirected graph.
  • DiGraph: directed graph.
  • MultiGraph: A flexible graph class that allows multiple undirected edges between pairs of nodes.
  • MultiDiGraph: A directed version of a MultiGraph.

Create an undirected graph in NetworkX:

Java
 




xxxxxxxxxx
1


 
1
import networkx as nx
2
G = nx.Graph()


Add nodes:

Java
 




xxxxxxxxxx
1


 
1
G.add_node(1)
2
G.add_nodes_from([2,3,4])
3
G.add_node(2,name='Tom',age=23)


Add edges:

Java
 




xxxxxxxxxx
1


 
1
G.add_edge(2,3)
2
G.add_edges_from([(1,2),(1,3)])
3
g.add_edge(1, 2, start_year=1996, end_year=2019)


In the last post (part one of this series), we have displayed the community detection algorithm Girvan-Newman provided by NetworkX.

Graph Database Nebula Graph

NetworkX usually uses local files as the data source, which is totally okay for static network researches. But when the graph network changes a lot, for example, some central nodes are deleted or important network topology changes are introduced, it is a little troublesome to generate, load, and analyze the new static files. At this time, we’d better persist the entire change process in a database, and load subgraphs or whole graphs directly from the database for real-time analysis. This post uses Nebula Graph as the graph database for storing the graph data.

Nebula Graph provides two methods to get graph structures:

  1. Query to return a subgraph.
  2. Scan the underlying storage to get the whole graph.

The first method is suitable for obtaining certain nodes and edges that meet the requirements through fine filtering and pruning in a large-scale graph network. Whereas the second method is more suitable for the whole graph analysis. It is usually used in the early stage of the project when conducting heuristic exploration. When you further understand the graph, you can use the first method for fine pruning analysis.

By reading the code of the python client for Nebula Graph, we know that the nebula-python/nebula/ngStorage/StorageClient.py and the nebula-python/nebula/ngMeta/MetaClient.py file are the APIs that interact with the underlying storage. These files contain a series of interfaces to scan nodes, scan edges, read columns, and so on. The following two interfaces can be used to scan all nodes and edges:

Java
 




x


 
1
def scanVertex(self, space, returnCols, allCols, limit, startTime, endTime)
2
def scanEdge(self, space, returnCols, allCols, limit, startTime, endTime)


1. Initialize a client and a ScanEdgeProcessor. The ScanEdgeProcessor decodes the edge data read from Nebula Graph:

Java
 




xxxxxxxxxx
1


 
1
metaClient = MetaClient([('192.168.8.16', 45500)])
2
metaClient.connect()
3
storageClient = StorageClient(metaClient)
4
scanEdgeProcessor = ScanEdgeProcessor(metaClient)


2. Initialize the parameters for the scanEdge interface:

Java
 




xxxxxxxxxx
1


 
1
spaceName = 'nba' # The name of the graph space to be read.
2
returnCols = {} # The edges (or nodes) and their property columns to be returned.
3
returnCols['serve'] = ['start_year', 'end_year']
4
returnCols['follow'] = ['degree']
5
allCols = False # Return all the property columns or not. When set to False, only the specified property columns in the returnCols are returned. When set to True, all property columns are returned.
6
limit = 100 # Maximum data entry number returned.
7
startTime = 0
8
endTime = sys.maxsize


3. Call the scanPartEdge interface to return the iterator for the canEdgeResponse object:

Java
 




xxxxxxxxxx
1


 
1
scanEdgeResponseIterator = storageClient.scanEdge(spaceName, returnCols, allCols, limit, startTime, endTime)


4. Keep reading the data in the ScanEdgeResponse object which the iterator points to till all the data are read:

Java
 




xxxxxxxxxx
1


 
1
while scanEdgeResponseIterator.hasNext():
2
    scanEdgeResponse = scanEdgeResponseIterator.next()
3
    if scanEdgeResponse is None:
4
        print("Error occurs while scaning edge")
5
        break
6
    processEdge(space, scanEdgeResponse)


In the preceding snippet, the process_edge is a custom function for processing the scanned edge data. This function first uses the scan_edge_processor to decode the data in the scan_edge_response. The decoded data can either be directly printed out or processed for other purposes such as importing these data into the calculation framework NetworkX.

5. Process data. Here we import all the scanned data into Graph G in the NetworkX:

Java
 




xxxxxxxxxx
1
14


 
1
def processEdge(space, scanEdgeResponse):
2
    result = scanEdgeProcessor.process(space, scanEdgeResponse)
3
    # Get the corresponding rows by edge_name
4
    for edgeName, edgeRows in result.rows.items():
5
        for row in edgeRows:
6
            srcId = row.defaultProperties[0].getValue()
7
            dstId = row.defaultProperties[2].getValue()
8
            print('%d -> %d' % (srcId, dstId))
9
            props = {}
10
            for prop in row.properties:
11
                propName = prop.getName()
12
                propValue = prop.getValue()
13
                props[propName] = propValue
14
            G.add_edges_from([(srcId, dstId, props)]) # Add edges to Graph G in the NetworkX


Nodes scanning is similar to the preceding process.

In addition, for some of the distributed graph calculation frameworks, Nebula Graph provides batch reading and storing based on the partitions. We will show you in the later post.

Applying Graph Analysis in NetworkX

When importing all the nodes and edges into the NetworkX according to the preceding process, we can do some basic graph analysis and calculations:

1. Draw a plot:

Java
 




xxxxxxxxxx
1


 
1
nx.draw(G, with_labels=True, font_weight='bold')
2
import matplotlib.pyplot as plt
3
plt.show()
4
plt.savefig('./test.png')


The plot you just drew looks like following:

2. Print out all the nodes and edges in the plot:

Java
 




xxxxxxxxxx
1


 
1
print('nodes: ', list(G.nodes))
2
print('edges: ', list(G.edges))


Results printed:

Java
 




xxxxxxxxxx
1


 
1
nodes:  [109, 119, 129, 139, 149, 209, 219, 229, 108, 118, 128, 138, 148, 208, 218, 228, 107, 117, 127, 137, 147, 207, 217, 227, 106, 116, 126, 136, 146, 206, 216, 226, 101, 111, 121, 131, 141, 201, 211, 221, 100, 110, 120, 130, 140, 150, 200, 210, 220, 102, 112, 122, 132, 142, 202, 212, 222, 103, 113, 123, 133, 143, 203, 213, 223, 104, 114, 124, 134, 144, 204, 214, 224, 105, 115, 125, 135, 145, 205, 215, 225]
2
edges:  [(109, 100), (109, 125), (109, 204), (109, 219), (109, 222), (119, 200), (119, 205), (119, 113), (129, 116), (129, 121), (129, 128), (129, 216), (129, 221), (129, 229), (129, 137), (139, 138), (139, 212), (139, 218), (149, 130), (149, 219), (209, 123), (219, 130), (219, 112), (219, 104), (229, 147), (229, 116), (229, 141), (229, 144), (108, 100), (108, 101), (108, 204), (108, 206), (108, 214), (108, 215), (108, 222), (118, 120), (118, 131), (118, 205), (118, 113), (128, 116), (128, 121), (128, 201), (128, 202), (128, 205), (128, 223), (138, 115), (138, 204), (138, 210), (138, 212), (138, 221), (138, 225), (148, 127), (148, 136), (148, 137), (148, 214), (148, 223), (148, 227), (148, 213), (208, 127), (208, 103), (208, 104), (208, 124), (218, 127), (218, 110), (218, 103), (218, 104), (218, 114), (218, 105), (228, 146), (228, 145), (107, 100), (107, 204), (107, 217), (107, 224), (117, 200), (117, 136), (117, 142), (127, 114), (127, 212), (127, 213), (127, 214), (127, 222), (127, 226), (127, 227), (137, 136), (137, 213), (137, 150), (147, 136), (147, 214), (147, 223), (207, 121), (207, 140), (207, 122), (207, 134), (217, 126), (217, 141), (217, 124), (217, 144), (106, 204), (106, 212), (106, 113), (116, 141), (116, 126), (116, 210), (116, 216), (116, 121), (116, 113), (116, 105), (126, 216), (136, 210), (136, 213), (136, 214), (146, 202), (146, 210), (146, 215), (146, 222), (146, 226), (206, 123), (216, 144), (216, 105), (226, 140), (226, 112), (226, 114), (226, 144), (101, 100), (101, 102), (101, 125), (101, 204), (101, 215), (101, 113), (101, 104), (111, 200), (111, 204), (111, 215), (111, 220), (121, 202), (121, 215), (121, 113), (121, 134), (131, 205), (131, 220), (141, 124), (141, 205), (141, 225), (201, 145), (211, 124), (221, 104), (221, 124), (100, 125), (100, 204), (100, 102), (100, 113), (100, 104), (100, 144), (100, 105), (110, 204), (110, 220), (120, 150), (120, 202), (120, 205), (120, 113), (140, 114), (140, 214), (140, 224), (150, 143), (150, 213), (200, 142), (200, 104), (200, 145), (210, 124), (210, 144), (210, 115), (210, 145), (102, 203), (102, 204), (102, 103), (102, 135), (112, 204), (122, 213), (122, 223), (132, 225), (202, 133), (202, 114), (212, 103), (222, 104), (103, 204), (103, 114), (113, 104), (113, 105), (113, 125), (113, 204), (133, 114), (133, 144), (143, 213), (143, 223), (203, 135), (213, 124), (213, 145), (104, 105), (104, 204), (104, 215), (114, 115), (114, 204), (134, 224), (144, 145), (144, 214), (204, 105), (204, 125)]


3. One of the common application is to calculate the shortest path between two nodes:

Java
 




xxxxxxxxxx
1


 
1
p1 = nx.shortest_path(G, source=114, target=211)
2
print('The shortest path between Node 114 and Node 211: ', p1)


Results printed:

Java
 




xxxxxxxxxx
1


 
1
The shortest path between Node 114 and Node 211:  [114, 127, 208, 124, 211]


4. Calculate the pagerank value for each node in the graph to see their influences:

Java
 




xxxxxxxxxx
1


 
1
print(nx.pagerank(G))


Results printed:

Java
 




xxxxxxxxxx
1


 
1
{109: 0.011507076520104863, 119: 0.007835838669313514, 129: 0.015304593799331218, 139: 0.007772926737873626, 149: 0.0073896601012629825, 209: 0.0065558926178649985, 219: 0.014100908598251508, 229: 0.011454115940170253, 108: 0.01645334474680034, 118: 0.01010598371500564, 128: 0.01594717876199238, 138: 0.01671097227127263, 148: 0.015898676579503977, 208: 0.009437234075904938, 218: 0.0153795416919104, 228: 0.005900393773635255, 107: 0.009745182763645681, 117: 0.008716335675518244, 127: 0.021565565312365507, 137: 0.011642680498867146, 147: 0.009721031073465738, 207: 0.01040504770909835, 217: 0.012054472529765329, 227: 0.005615576255373405, 106: 0.007371191843767635, 116: 0.020955704443679106, 126: 0.007589432032220849, 136: 0.015987209357117116, 146: 0.013922108926721374, 206: 0.008554794629575304, 216: 0.011219193251536395, 226: 0.013613173390725904, 101: 0.016680863106330837, 111: 0.010121524312495604, 121: 0.017545503989576015, 131: 0.008531567756846938, 141: 0.014598319866130227, 201: 0.0058643663430632525, 211: 0.003936285336338021, 221: 0.009587911774927793, 100: 0.02243017302167168, 110: 0.007928429795381916, 120: 0.011875669801396205, 130: 0.0073896601012629825, 140: 0.01205992633948699, 150: 0.010045605782606326, 200: 0.015289870550944322, 210: 0.017716629501785937, 220: 0.008666577509181518, 102: 0.014865431161046641, 112: 0.007931095811770324, 122: 0.008087439927630492, 132: 0.004659566123187912, 142: 0.006487446038191551, 202: 0.013579313206377282, 212: 0.01190888044566142, 222: 0.011376739416933006, 103: 0.013438110749144392, 113: 0.02458154500563397, 123: 0.01104978432213578, 133: 0.00743370900670294, 143: 0.008011123394996112, 203: 0.006883198710237787, 213: 0.020392557117890422, 223: 0.012345866520333572, 104: 0.024902235588979776, 114: 0.019369722463816744, 124: 0.017165705442951484, 134: 0.008284361176173354, 144: 0.019363506469972095, 204: 0.03507634139024834, 214: 0.015500649025348538, 224: 0.008320315540621754, 105: 0.01439975542831122, 115: 0.007592722237637133, 125: 0.010808523955754608, 135: 0.006883198710237788, 145: 0.014654713389044883, 205: 0.014660118545887803, 215: 0.01337467974572934, 225: 0.009909720748343093}


In addition, similar to the first post in the series, you can access to Gephi to make the visualization look better.

The code used in this post can be accessed here.

Graph (Unix) NetworkX Nebula (computing) Java (programming language) Database Gephi Data (computing)

Published at DZone with permission of Jamie Liu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Practical Generators in Go 1.23 for Database Pagination
  • Java and MongoDB Integration: A CRUD Tutorial [Video Tutorial]
  • Mastering Persistence: Why the Persistence Layer Is Crucial for Modern Java Applications

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook