Reading the NSA's Codebase: LemonGraph Review Part 2: Storing Edges and Properties
In the last post, I was able to follow up on the first two lines. Now, let’s see how the edge is actually used and how properties are stored.
Join the DZone community and get the full member experience.
Join For FreeThe last post left us figuring out how LemonGraph is storing nodes and edges. There are the special properties type and val that are used for this kind of lookup. I find it interesting that these two properties are treated in a special manner. It seems like it would be more useful to have such distinction completely arbitrary or use a single property. A more common approach would be to just have a unique key for each node/edge and be done with it. I’m guessing that this is needed because you might want to do partial searches. So give me all the nodes whose type is X without caring what the value is.
Here is the sample code from the project’s page:
In the last post, I was able to follow up on the first two lines. Now, let’s see how the edge is actually used and how properties are stored. The edge() method call ends up calling the __edge_resolve() method:
This is very similar to what we saw with __node_resolve() method. This might be a good time to stop and consider the edge_t and node_t structures:
They seem pretty simple. The is_new:1 syntax is used by C for bitfields. In other words, the is_new field takes a single bit, while the rectype takes 7 bits.
A node has an ID, the record type, the next pointer (which I assume is used for historical queries) and the type/val fields. The edge is pretty much the same shape, except that it has the source and target fields as well.
I’m going to skip detailed dive into the internals of __edge_resolve(), they are nearly identical as the __node_resolve() one. I do want to note that LemonGraph define the following indexes on edges:
- type, val, src, tgt –> id
- src, type –> id
- tgt, type –> id
In other words, it can very quickly find matches for queries on:
- Edges by type and values
- Edges from a source of a particular type
- Edges to a target of a particular type
Setting a property ends up calling _prop_resolve() which looks like this:
Again, very similar to what we have seen before. This is a good thing. Predictability in a code base is a very nice property (pun intended).
Properties are indexes on:
- parent, key –> id
In other words, you can look up the properties of a node (or an edge) based on its parent or type. Interestingly enough, it isn’t indexing the value of a property. I would expect to have an index on (key, value –> id), which would allow for richer queries. On the other hand, I guess they do most / all of their queries on the type/val pair as the source.
With this, I feel like I have a good grasp on what is going on in LemonGraph in terms of the on-disk data structure. In the next post, I’m going to try to figure out how it is handling queries.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments