Grakn Python Driver and How to Roll Your Own
This article will walk you through the Python driver and provide guidelines on how you can write your own for your language of choice.
Join the DZone community and get the full member experience.
Join For Free
At Grakn, we recently released Grakn 1.3, with a slew of new features, bug fixes, and performance enhancements. Included in this release are new gRPC-based drivers for Java, NodeJS, and Python. This article will walk you through the Python driver and provide guidelines on how you can write your own for your language of choice.
Overview
The main reason for rewriting our drivers was a move from REST to gRPC in Grakn. This change has cleaned up our API and should provide performance benefits. Further, all of our available drivers (Java, Node, and Python) now expose the same objects and methods to users, subject to language naming conventions and available types. To maintain this uniformity across the stack, new language drivers should provide the same interface. Note that you will require both gRPC and protobuf support to create a functioning driver, so double check a) that compilers for your language exist, and b) your target language version is compatible with the compiler.
Driver Architecture
We can divide our drivers into 5 user-facing components:
Grakn
— the driver entry point, instantiated with a URI and optionally credentials for the Grakn instance, from which we create Sessions.Session
— A connection to a keyspace within the instance, from which we create Transactions.Transaction
— A single database transaction which may be used to query, close, commit, etc.Concept
— An object representing any database entity (hierarchy of subtypes diagrammed below)
Answer
— The result from string queries submitted to the server (and subtypes)
Everything else can be regarded as machinery to make this interface functional.
The above structure is generated from https://github.com/graknlabs/grakn/tree/master/client-python. Roughly, we have Grakn
, Session
, and Transaction
exposed in the top level package’s __init__.py
, with gRPC-specific implementation contained in the service
sub-package. A TransactionService
utilizes the RequestBuilder
(which creates required gRPC messages), the Communicator
(that wraps a bi-directional gRPC stream, exposing a one-in one-out server connection), and the ResponseReader
(which converts received gRPC messages into local Python objects). Received objects may be a subtype of Concept
, or an Answer
subtype. I recommend reading the README and glancing at the code to see what each of these objects exposes.
gRPC Summary
Key takeaways about gRPC are that it is HTTP 2.0 based, supports bidirectional streaming, defines services and messages using protocol buffer syntax and definitions, and can be compiled to a variety of language stubs which are interfaced to on both the server and client side.
We don't use much of the advanced protocol buffer functionality like channel multiplexing, instead focusing on core RPC functionality and complex, strongly typed messages.
For a slightly longer gRPC introduction, I recommend this.
Our gRPC protocol is defined at https://github.com/graknlabs/grakn/tree/master/client-protocol/proto. We have four .proto
files: the gRPC entry point for all Transaction operations is in Session.proto
, Keyspace operations are in Keyspace.proto
, etc.
Understanding Our gRPC Protocol
The key to implementing a Grakn driver successfully will be understanding how to create and unpack the correct gRPC messages. Many of the methods exposed to users on Concept
s, (e.g. an AttributeType
from the hierarchy above) are really RPC calls to the Grakn server. To pick a simple example, when calling an attribute_type.create()
, we create a gRPC request to the Grakn instance, which creates an instance of a person and returns this instance via another gRPC message. The returned message is unpacked and presented to the user as an instance of the Attribute
class.
To become familiar with our RPC message formats, we can look at the protobuf definition files found under
Here’s an excerpt fromSession.proto
:
service SessionService {
rpc transaction (stream Transaction.Req) returns (stream Transaction.Res);
}
message Transaction {
message Req {
oneof req {
Open.Req open_req = 1;
Commit.Req commit_req = 2;
Query.Req query_req = 3;
Iter.Req iterate_req = 4;
GetSchemaConcept.Req getSchemaConcept_req = 5;
GetConcept.Req getConcept_req = 6;
GetAttributes.Req getAttributes_req = 7;
...
}
}
message Res {
oneof res {
Open.Res open_res = 1;
Commit.Res commit_res = 2;
Query.Iter query_iter = 3;
Iter.Res iterate_res = 4;
GetSchemaConcept.Res getSchemaConcept_res = 5;
GetConcept.Res getConcept_res = 6;
GetAttributes.Iter getAttributes_iter = 7;
...
}
}
message Iter {
message Req {
int32 id = 1;
}
message Res {
oneof res {
bool done = 1;
Query.Iter.Res query_iter_res = 2;
GetAttributes.Iter.Res getAttributes_iter_res = 3;
Method.Iter.Res conceptMethod_iter_res = 4;
}
}
}
...
message GetAttributes {
message Req {
ValueObject value = 1;
}
message Iter {
int32 id = 1;
message Res {
Concept attribute = 1;
}
}
}
Our main RPC endpoint is the single RPC call named transaction
. In practice, we use this endpoint as a bidirectional stream. Because the protobuf messages are typed, we can walk through the protobuf file definition to see how to build messages we need. To understand what exactly this means, I’ll walk through a more advanced example.
Get Attributes by Value
I’m going to break down the messages sent by the following piece of Python code:
# make sure you've run `pip3 install grakn` and have Grakn running
client = grakn.Grakn(uri="localhost:48555")
with client.session(keyspace="test") as session:
with session.transaction(grakn.TxType.READ) as tx:
iter = tx.get_attributes_by_value(“John”,
grakn.DataType.STRING)
Here, we want to retrieve all the attributes that have string values called “John”. The first gRPC message created is a Transaction.Req
from Session.proto
, which needs to have getAttributes_req
field populated. This, in turn has the type GetAttributes.Req
, which has a single field called value
. This in turn is a ValueObject
, which is defined in the Concept.proto
file (excerpt below):
message Concept {
string id = 1;
BASE_TYPE baseType = 2;
enum BASE_TYPE {
...
ATTRIBUTE_TYPE = 3;
...
}
...
}
message ValueObject {
oneof value {
string string = 1;
bool boolean = 2;
int32 integer = 3;
int64 long = 4;
float float = 5;
double double = 6;
int64 date = 7; // time since epoch in milliseconds
}
}
In this case, the ValueObject needs to the string
field populated with “John.”
Phew! In Python, printing the final message to a string we should get something that looks roughly like this:
{ # type Transaction.Req
getAttributes_req { # type GetAttributes.Req
value { # type ValueObject (from Concept.proto)
string : "John"
}
}
}
gRPC implementations differ here in how to actually compose these messages together: for instance, in python, each of these compound messages needs to be instantiated and embedded using CopyFrom
or MergeFrom
(Python Protobuf docs).
The message that is returned will be a Transaction.Req
. But which field will be populated? You can get this from our naming conventions: It should be the one with type GetAttributes.Iter
. This message will have a single field called id
.
{ # type Transaction.Res
getAttributes_iter { # type GetAttributes.iter
id: 1
}
}
Iterating
Great, but how is this useful?
Well, the id
returned represents an iterator on the server, which we can repeatedly request to retrieve the actual Attribute
instances. This can be wrapped up on the client side as a local iterator. In Python, we then retrieve the next element in an iterator by calling next(attribute_iterator)
...
with client.session(keyspace="test") as session:
with session.transaction(grakn.TxType.READ) as tx:
attribute_iterator = tx.get_attributes_by_value(...)
attr = next(attribute_iterator)
The next(attribute_iterator)
needs create a new gRPC message with the following format:
{ # type Transaction.Req
iterate_req { # type Iter.Req
id : 1 # or whatever the iterator ID is
}
}
Which returns:
{ # type Transaction.Req
iterate_res { # type Iter.Res
getAttributes_iter_res { # type GetAttributes.Iter.Res
concept { # Type Concept (from Concept.proto)
id : "VS...",
baseType: 3
}
}
}
Finally, we have the first actual Concept
definition, although it has arrived as a gRPC message. We can unpack the id
and baseType
into local objects and present them to the user.
The next time next(attribute_iterator)
is called, we repeat the process of making an iterate_req
and unpacking the returned message into a local object.
Tips
I thought I'd take a moment to write out some of the hurdles and solutions I came across when implementing the Python driver.
Circular Dependencies
Unless you create a monolithic driver, you're more than likely to split your code into several modules that will have circular dependencies. Intuitively, local Concept
objects may access the server and create other Concept
s. Thus, Concept
s depend on a networking component which depends on Concept
, a stateful circular dependency.
For example, in the Python driver, Concept
uses the TransactionService
to access properties on Grakn. Requests come back and are converted by ResponseReader
, which takes gRPC messages from the server and returns, among other things, instances of Concept
s.
All of our drivers have faced this issue and worked around it in different ways: Node uses dependency injection (instantiate the circular dependencies at an earlier point and then assign them into each other), Java lumps together much of the dependent functionality (and actually has a few circular imports), and Python allows circular imports as long as you follow certain import styles.
Compiling and Importing gRPC/protobufs
Each supported language has its own compiler. For the Python compiler, we invoke our Makefile
, which in turn calls the python compiler called grpc_tools.protoc
. You may run into problems importing the resulting modules into your programs (this was a major pain point in Python), because the packages declared in the .proto
files don't match the target folder structure. Our solution was to extend the Makefile
: create the target folder structure, copy and update the .proto
files to reflect this structure, run protoc
, and then delete the copied .proto
files. Try to avoid symlinks or independent copies of the protocol definition files.
Tests
Tests are an important part of our drivers! Especially when these are the main entry point to using Grakn, we want to ensure as much correctness as we can. Luckily, any new drivers can more or less copy the tests from our Python or Node tests and modify them to suit your language's framework and test style.
Good Luck :)
We hope this post both illuminates the new Python driver and acts as a guide for implementing your own language's client for Grakn! If you have any questions at all, want to collaborate, or just say hi, join our community Slack, or email me at joshua@grakn.ai.
Published at DZone with permission of Joshua Send, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments