{{announcement.body}}
{{announcement.title}}

Nebula Graph Source Code Explained via a Sample Graph Query

DZone 's Guide to

Nebula Graph Source Code Explained via a Sample Graph Query

In this article, take a look at the Nebula Graph source code and see a sample graph query.

· Database Zone ·
Free Resource

When I saw the Nebula Graph code repository for the first time, I was so shocked by its huge size that I didn’t know how to dig into the source code. Then I worked up the nerve. After reading the code and running the use cases over and over, I finally gained some experience worth sharing with you, hoping that all my experience could push you to give Nebula Graph source code a shot to know more about the graph DBMS, to improve your graph database knowledge, and to fix some bugs that are not so complicated of this repository.

In this article, I took the SHOW SPACES statement as an example to show you how Nebula Graph processes an nGQL statement after it was input on the client side. GDB, the GNU Project debugger, was used to trace the execution.

Additionally, some open-source libraries are used in Nebula Graph. For more information, see the Libraries section.

Architecture of Nebula Graph


A complete Nebula Graph DBMS contains three services: Query Service, Storage Service, and Meta Service. Each service has its own executable binary files.

Query Service is responsible for these tasks:

  • Managing connection to the client
  • Parsing an nGQL statement input from a client into an AST (Abstract Syntax Tree) and then parsing the AST to an execution plan
  • Optimizing the execution plan
  • Executing queries with the optimized execution plan

Storage Service is responsible for distributed data store.

Meta Service is responsible for these tasks:

  • Operating CRUD on graph schema objects
  • Managing the cluster
  • Performing user authentication

In this case, I used Query Service as an example to show you some experience.

Source Code Directory Hierarchy

When we get the source packages and have them unzipped, we should do a check of the source code directory hierarchy. Each package has its own functions. Here is how the src directory looks like.

Shell
 




xxxxxxxxxx
1
16


1
|--src
2
    |--client // Provides the code for the client
3
    |--common // Provides some common basic components
4
    |--console
5
    |--daemons
6
    |--dataman
7
    |--graph // Contains most codes of Query Service
8
    |--interface // Contains some communication interfaces for meta, storage, and query services
9
    |--jni
10
    |--kvstore
11
    |--meta // Relates to meta service information 
12
    |--parser // Contains modules for lexical parsing (Lexer) and semantic analysis 
13
    |--storage // Contains codes about the storage layer
14
    |--tools
15
    |--webservice
16
 
          


Code Tracing

In the scripts directory, use the scripts to start the metad and the storaged services.

When the services are started, run the nebula.service status all to do a check of the service status.

Start GDB and run the nebula-graphd binary program, which is in the bin directory.

Shell
 




xxxxxxxxxx
1


1
gdb> set args --flagfile  /home/mingquan.ji/1.0/nebula-install/etc/nebula-graphd.conf   //specify the arguments
2
gdb> set follow-fork-mode child   // This is a daemon, so the new process is debugged after a fork and the parent process runs unimpeded.
3
gdb> b main         // Set a breakpoint at entry to main



Use the run command to start the nebula-graphd program under GDB, and then use the next command to execute the code line by line until the command stops at the gServer->serve(); // Blocking wait until shut down via gServer->stop() line. It means the thread to receive the connection from the client is blocked and the server is waiting for the connection, so we need to find the function that processes the request sent from the client.

Nebula Graph uses FBThrift to define the communication interfaces for different services, and in the src/interface/graph.thrift file, you can find the communication interface definition for GraphService as follows.

Shell
 




xxxxxxxxxx
1


1
service GraphService {
2
    AuthResponse authenticate(1: string username, 2: string password)
3
    oneway void signout(1: i64 sessionId)
4
    ExecutionResponse execute(1: i64 sessionId, 2: string stmt)
5
}



The gServer->serve() line is preceded with the following lines.

Shell
 




xxxxxxxxxx
1


 
1
auto interface = std::make_shared<GraphService>();
2
status = interface->init(ioThreadPool);
3
gServer->setInterface(std::move(interface));
4
gServer->setAddress(localIP, FLAGS_port);



From these codes, we know that the GraphService object is the one that prcesses the connection and request sent from the client, so we can set a breakpoint at the GraphService.cpp:future_execute line to trace the execution.

Now, let’s launch another terminal and change the path to the nebula installation directory. Run ./nebula -u=root -p=nebula to connect to the nebula services. And then, run the SHOW SPACES statement. You will see that no result is returned. It is because the services are blocked for debugging on the server side. Let’s go back to the server side and run the continue command, and the following lines are returned.

After session is verified, go to the executionEngine->execute() and run the step command to step inside the function.

Shell
 




xxxxxxxxxx
1


1
auto plan = new ExecutionPlan(std::move(ectx));
2
plan->execute();



Run the step command to step inside the execute function of ExecutionPlan and then run the following command.

Shell
 




xxxxxxxxxx
1


 
1
auto result = GQLParser().parse(rctx->query());



The parse module is mainly composed of Flex and Bison. Flex, working like regular expressions, is used to divide the input statements into tokens, and the src/parser/scanner.lex file is used as the lexicon data file. Bison is used to parse the tokens into an AST, and the src/parser/parser.yy file is used for semantic analysis. The semantic analysis works as follows.

Shell
 




xxxxxxxxxx
1
23


1
go_sentence
2
    : KW_GO step_clause from_clause over_clause where_clause yield_clause {
3
        auto go = new GoSentence();
4
        go->setStepClause($2);
5
        go->setFromClause($3);
6
        go->setOverClause($4);
7
        go->setWhereClause($5);
8
        if ($6 == nullptr) {
9
            auto *cols = new YieldColumns();
10
            for (auto e : $4->edges()) {
11
                if (e->isOverAll()) {
12
                    continue;
13
                }
14
                auto *edge  = new std::string(*e->edge());
15
                auto *expr  = new EdgeDstIdExpression(edge);
16
                auto *col   = new YieldColumn(expr);
17
                cols->addColumn(col);
18
            }
19
            $6 = new YieldClause(cols);
20
        }
21
        go->setYieldClause($6);
22
        $$ = go;
23
    }



When GO statements are matched, applicable nodes are constructed for an AST, and then the nodes are handled by Bison and the AST is generated.

After lexical analysis and semantic analysis are done, the execution module works. Still inside GDB, go inside the execute function and run the step command line by line and stop at the ShowExecutor::execute line.


Run the next command line by line, and when it comes to the showSpaces() function, run the step command to step inside it.

Shell
 




xxxxxxxxxx
1


1
auto future = ectx()->getMetaClient()->listSpaces();
2
auto *runner = ectx()->rctx()->runner();
3
'''
4
'''
5
std::move(future).via(runner).thenValue(cb).thenError(error);



From the intructions above, we see that Query Service obtained the spaces data through the communications between metaClient and Meta Service, and then used the cb callbak to return the data. Till now, the SHOW SPACES statement is executed completely. Other nGQL statements, even those more complicated ones, are executed in the similar way.

  • For a running service, it is recommended that you have the process ID and then run the gdb attach PID command to debug this process.
  • If you don’t want to launch both the server and the client for debugging, you can use the test directory. Each function under the src directory has its own test directory. It contains all the code for unit testing of the applicable function or module. These codes can be used to compile the functional module, and the execution can be traced. The testdirectory can be used as follows:
    1. Under a diretory for a functional module, find its CMakeLists.txt file and find the module name in this file.
    2. In the build directory, run the make <module name> command. The applicable binary program is generated in the build/bin/test directory.
    3. Start GDB to debug and trace the execution.

Libraries

Before reading the Nebula Graph source code, you may need to know something about these libraries:

  1. Flex and Bison: tools used for lexical analysis and semantic analysis. They parse the input nGQL statements into an AST.
  2. FBThrift: an open-source RPC framework, developed by Facebook. It defines the communication process among the Meta, Storage, and Graph layers of Nebula Graph DBMS.
  3. folly: an open-source library of C++14 component, developed by Facebook. It offers functions like the Boost and the std libraries, but with optimized performance.
  4. Gtest: an open-source framework for C++ unit testing, developed by Google.

Welcome to contribute to Nebula Graph on GitHub. Find the repo here. If you have any questions, feel free to raise them on the official forum.

Topics:
c++, database, graph database, nebula graph, source code, tutorial

Published at DZone with permission of Jamie Liu . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}