DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Reading the NSA's Codebase: LemonGraph Review Part 2: Storing Edges and Properties

In the last post, I was able to follow up on the first two lines. Now, let’s see how the edge is actually used and how properties are stored.

Oren Eini user avatar by
Oren Eini
·
Aug. 14, 18 · Review
Like (1)
Save
Tweet
Share
3.26K Views

Join the DZone community and get the full member experience.

Join For Free

The last post left us figuring out how LemonGraph is storing nodes and edges. There are the special properties type and val that are used for this kind of lookup. I find it interesting that these two properties are treated in a special manner. It seems like it would be more useful to have such distinction completely arbitrary or use a single property. A more common approach would be to just have a unique key for each node/edge and be done with it. I’m guessing that this is needed because you might want to do partial searches. So give me all the nodes whose type is X without caring what the value is.

Here is the sample code from the project’s page:

image

In the last post, I was able to follow up on the first two lines. Now, let’s see how the edge is actually used and how properties are stored. The edge() method call ends up calling the __edge_resolve() method:

image

This is very similar to what we saw with __node_resolve() method. This might be a good time to stop and consider the edge_t and node_t structures:

image

They seem pretty simple. The is_new:1 syntax is used by C for bitfields. In other words, the is_new field takes a single bit, while the rectype takes 7 bits.

A node has an ID, the record type, the next pointer (which I assume is used for historical queries) and the type/val fields. The edge is pretty much the same shape, except that it has the source and target fields as well.

I’m going to skip detailed dive into the internals of __edge_resolve(), they are nearly identical as the __node_resolve() one. I do want to note that LemonGraph define the following indexes on edges:

  • type, val, src, tgt –> id
  • src, type –> id
  • tgt, type –> id

In other words, it can very quickly find matches for queries on:

  • Edges by type and values
  • Edges from a source of a particular type
  • Edges to a target of a particular type

Setting a property ends up calling _prop_resolve() which looks like this:

image

Again, very similar to what we have seen before. This is a good thing. Predictability in a code base is a very nice property (pun intended).

Properties are indexes on:

  • parent, key –> id

In other words, you can look up the properties of a node (or an edge) based on its parent or type. Interestingly enough, it isn’t indexing the value of a property. I would expect to have an index on (key, value –> id), which would allow for richer queries. On the other hand, I guess they do most / all of their queries on the type/val pair as the source.

With this, I feel like I have a good grasp on what is going on in LemonGraph in terms of the on-disk data structure. In the next post, I’m going to try to figure out how it is handling queries.

Property (programming)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Submit a Post to DZone
  • The Quest for REST
  • API Design Patterns Review
  • A Brief Overview of the Spring Cloud Framework

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: