Nebula Graph Source Code Explained via a Sample Graph Query
In this article, take a look at the Nebula Graph source code and see a sample graph query.
Join the DZone community and get the full member experience.
Join For FreeWhen I saw the Nebula Graph code repository for the first time, I was so shocked by its huge size that I didn’t know how to dig into the source code. Then I worked up the nerve. After reading the code and running the use cases over and over, I finally gained some experience worth sharing with you, hoping that all my experience could push you to give Nebula Graph source code a shot to know more about the graph DBMS, to improve your graph database knowledge, and to fix some bugs that are not so complicated of this repository.
In this article, I took the SHOW SPACES
statement as an example to show you how Nebula Graph processes an nGQL statement after it was input on the client side. GDB, the GNU Project debugger, was used to trace the execution.
Additionally, some open-source libraries are used in Nebula Graph. For more information, see the Libraries section.
Architecture of Nebula Graph
A complete Nebula Graph DBMS contains three services: Query Service, Storage Service, and Meta Service. Each service has its own executable binary files.
Query Service is responsible for these tasks:
- Managing connection to the client
- Parsing an nGQL statement input from a client into an AST (Abstract Syntax Tree) and then parsing the AST to an execution plan
- Optimizing the execution plan
- Executing queries with the optimized execution plan
Storage Service is responsible for distributed data store.
Meta Service is responsible for these tasks:
- Operating CRUD on graph schema objects
- Managing the cluster
- Performing user authentication
In this case, I used Query Service as an example to show you some experience.
Source Code Directory Hierarchy
When we get the source packages and have them unzipped, we should do a check of the source code directory hierarchy. Each package has its own functions. Here is how the src
directory looks like.
xxxxxxxxxx
|--src
|--client // Provides the code for the client
|--common // Provides some common basic components
|--console
|--daemons
|--dataman
|--graph // Contains most codes of Query Service
|--interface // Contains some communication interfaces for meta, storage, and query services
|--jni
|--kvstore
|--meta // Relates to meta service information
|--parser // Contains modules for lexical parsing (Lexer) and semantic analysis
|--storage // Contains codes about the storage layer
|--tools
|--webservice
Code Tracing
In the scripts
directory, use the scripts to start the metad and the storaged services.
When the services are started, run the nebula.service status all
to do a check of the service status.
Start GDB and run the nebula-graphd
binary program, which is in the bin
directory.
xxxxxxxxxx
gdb> set args --flagfile /home/mingquan.ji/1.0/nebula-install/etc/nebula-graphd.conf //specify the arguments
gdb> set follow-fork-mode child // This is a daemon, so the new process is debugged after a fork and the parent process runs unimpeded.
gdb> b main // Set a breakpoint at entry to main
Use the run
command to start the nebula-graphd
program under GDB, and then use the next
command to execute the code line by line until the command stops at the gServer->serve(); // Blocking wait until shut down via gServer->stop()
line. It means the thread to receive the connection from the client is blocked and the server is waiting for the connection, so we need to find the function that processes the request sent from the client.
Nebula Graph uses FBThrift to define the communication interfaces for different services, and in the src/interface/graph.thrift
file, you can find the communication interface definition for GraphService as follows.
xxxxxxxxxx
service GraphService {
AuthResponse authenticate(1: string username, 2: string password)
oneway void signout(1: i64 sessionId)
ExecutionResponse execute(1: i64 sessionId, 2: string stmt)
}
The gServer->serve()
line is preceded with the following lines.
xxxxxxxxxx
auto interface = std::make_shared<GraphService>();
status = interface->init(ioThreadPool);
gServer->setInterface(std::move(interface));
gServer->setAddress(localIP, FLAGS_port);
From these codes, we know that the GraphService
object is the one that prcesses the connection and request sent from the client, so we can set a breakpoint at the GraphService.cpp:future_execute
line to trace the execution.
Now, let’s launch another terminal and change the path to the nebula installation directory. Run ./nebula -u=root -p=nebula
to connect to the nebula services. And then, run the SHOW SPACES
statement. You will see that no result is returned. It is because the services are blocked for debugging on the server side. Let’s go back to the server side and run the continue
command, and the following lines are returned.
After session
is verified, go to the executionEngine->execute()
and run the step
command to step inside the function.
xxxxxxxxxx
auto plan = new ExecutionPlan(std::move(ectx));
plan->execute();
Run the step
command to step inside the execute
function of ExecutionPlan
and then run the following command.
xxxxxxxxxx
auto result = GQLParser().parse(rctx->query());
The parse
module is mainly composed of Flex and Bison. Flex, working like regular expressions, is used to divide the input statements into tokens, and the src/parser/scanner.lex file is used as the lexicon data file. Bison is used to parse the tokens into an AST, and the src/parser/parser.yy file is used for semantic analysis. The semantic analysis works as follows.
xxxxxxxxxx
go_sentence
: KW_GO step_clause from_clause over_clause where_clause yield_clause {
auto go = new GoSentence();
go->setStepClause($2);
go->setFromClause($3);
go->setOverClause($4);
go->setWhereClause($5);
if ($6 == nullptr) {
auto *cols = new YieldColumns();
for (auto e : $4->edges()) {
if (e->isOverAll()) {
continue;
}
auto *edge = new std::string(*e->edge());
auto *expr = new EdgeDstIdExpression(edge);
auto *col = new YieldColumn(expr);
cols->addColumn(col);
}
$6 = new YieldClause(cols);
}
go->setYieldClause($6);
$$ = go;
}
When GO
statements are matched, applicable nodes are constructed for an AST, and then the nodes are handled by Bison and the AST is generated.
After lexical analysis and semantic analysis are done, the execution module works. Still inside GDB, go inside the execute
function and run the step
command line by line and stop at the ShowExecutor::execute
line.
Run the next
command line by line, and when it comes to the showSpaces()
function, run the step
command to step inside it.
xxxxxxxxxx
auto future = ectx()->getMetaClient()->listSpaces();
auto *runner = ectx()->rctx()->runner();
'''
'''
std::move(future).via(runner).thenValue(cb).thenError(error);
From the intructions above, we see that Query Service obtained the spaces
data through the communications between metaClient and Meta Service, and then used the cb
callbak to return the data. Till now, the SHOW SPACES
statement is executed completely. Other nGQL statements, even those more complicated ones, are executed in the similar way.
- For a running service, it is recommended that you have the process ID and then run the
gdb attach PID
command to debug this process. - If you don’t want to launch both the server and the client for debugging, you can use the
test
directory. Each function under thesrc
directory has its owntest
directory. It contains all the code for unit testing of the applicable function or module. These codes can be used to compile the functional module, and the execution can be traced. Thetest
directory can be used as follows:- Under a diretory for a functional module, find its
CMakeLists.txt
file and find the module name in this file. - In the
build
directory, run themake <module name>
command. The applicable binary program is generated in thebuild/bin/test
directory. - Start GDB to debug and trace the execution.
- Under a diretory for a functional module, find its
Libraries
Before reading the Nebula Graph source code, you may need to know something about these libraries:
- Flex and Bison: tools used for lexical analysis and semantic analysis. They parse the input nGQL statements into an AST.
- FBThrift: an open-source RPC framework, developed by Facebook. It defines the communication process among the Meta, Storage, and Graph layers of Nebula Graph DBMS.
- folly: an open-source library of C++14 component, developed by Facebook. It offers functions like the Boost and the
std
libraries, but with optimized performance. - Gtest: an open-source framework for C++ unit testing, developed by Google.
Welcome to contribute to Nebula Graph on GitHub. Find the repo here. If you have any questions, feel free to raise them on the official forum.
Published at DZone with permission of Jamie Liu. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
-
Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
-
Merge GraphQL Schemas Using Apollo Server and Koa
-
SRE vs. DevOps
Comments