Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Moving Towards a Standard Property Graph Query Language

DZone 's Guide to

Moving Towards a Standard Property Graph Query Language

See why we should be moving towards a standard property graph query language.

· Database Zone ·
Free Resource

Graph models are seeing more momentum and adoption. The query language on property graphs is arguably the hottest topic for graph database users. 

There are two competing demands in the property graph query language: ease-of-use vs. fine control. 

Ease-of-use requires more declarative design (pattern matching), while fine-control requires an accumulator. The market sees a more declarative trend, both in commercial interests and the ANSI standard efforts. However, real-world customer feedback shows a lack of sufficiency in pattern matching only. There is a pressing demand on fine-control of graph algorithms, which can be fulfilled by an accumulator. 

In this article, the marriage of pattern matching and the accumulator is presented. We first present pattern matching, followed by an example of an accumulator used in the ACCUM clause of a pattern-matching block. 

Pattern matching enables users to focus on specifying what multi-hop path pattern they want in a query block without worrying about the direction of the underlying data flow. They can think about the pattern as one query block (SELECT-FROM-WHERE) instead of several individual blocks. The pattern can be fixed length or variable length (using Kleene stars for repeated hops), which significantly shortens the GSQL query length.

All examples below focus on the FROM clause, where the path pattern is specified and the aliases are bound. Other clauses such as WHERE clause, ACCUM clause, etc. are omitted for brevity.

1-Hop Atomic Pattern

Let’s start with the basics. Below, we first show a 1-hop fix length pattern and a 1-hop variable length pattern (star pattern). The description of each pattern is under each FROM clause. For a given edge, if it is undirected, we do not put any decoration around it. For a directed edge, we use “>” or “<” to indicate its direction, such that a user can know the source and target vertex types based on the edge definition.

  • 1-hop pattern
    1. FROM X:x – (E1:e1) – Y:y
      • Undirected edge
    2. FROM X:x – (E2>:e2) – Y:y
      • Right directed edge, x binds to the source of E2, y binds to the target of E2.
    3. FROM X:x – (<E3:e3) – Y:y
      • Left directed edge, y binds to the source of E3, x binds to the target of E3.
    4. FROM X:x – (_:e) – Y:y
      • Any undirected edge
    5. FROM X:x – (_>:e) – Y:y
      • Any right directed edge
    6. FROM X:x – (<_:e) – Y:y
      • Any left directed edge
    7. FROM X:x – ((<_|_):e) – Y:y
      • Any left directed or any undirected
    8. FROM X:x – ((E1|E2>|<E3):e) – Y:y
      • Disjunctive 1-hop edge. (x,y) are connected by E1 or E2 or E3.
    9. FROM X:x – () – Y:y
      • any edge (directed or undirected) match this 1-hop pattern
      • Same as this: (<_|_>|_)
  • 1-hop star pattern — repetition of an edge pattern, 0 or more times
    1. FROM X:x – (E1*) – Y:y
    2. FROM X:x – (E2>*) – Y:y
    3. FROM X:x – (<E3*) – Y:y
    4. FROM X:x – (_*) – Y:y
    5. FROM X:x – ((E1|E2>|<E3)*) – Y:y
  • 1-hop star pattern with bounds
    1. FROM X:x – (E1*2..) – Y:y
      • Lower bounds only. There is a chain of at least 2 E1 edges.
    2. FROM X:x – (E2>*..3) – Y:y
      • Upper bounds only. There is a chain of between 0 and 3 E2 edges.
    3. FROM X:x – (<E3*3..5) – Y:y
      • Both Lower and Upper bounds. There is a chain of 3 to 5 E3 edges.
    4. FROM X:x – ((E1|E2>|<E3)*3) – Y:y
      • Exact bound. There is a chain of exactly 3 edges, where each edge is either E1, E2>, or <E3. 

Multiple Hop Pattern — Succinct Representation

With the above single hop pattern, we can extend it to a multiple hop pattern. Below are some examples to go to 2 and 3 hops.

  • 2-hop pattern
    • FROM X:x-(E1:e1)-Y:y-(E2>:e2)-Z:z
      • If we don’t need to refer to y in any expression, we can use the following syntax sugar to concatenate E1 and E2.
        • FROM X:x-(E1:e1.E2>:e2)-Z:z
      • If we don’t need to refer to e1 or e2 either, we can further simplify the path expression:
        • FROM X:x-(E1.E2>)-Z:z
  • 3-hop pattern
    • FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u
      • If we only need to reference the end vertices and not any of the intermediate vertices or edges, we can shorten the expression:
        • FROM X:x-(E2>.<E3.E4)-Y:y

An Example Based on the LDBC Benchmark

The LDBC Social Network Benchmark employs a social forum schema and has three query workloads to ask different types of queries on this social data.

For example, IC_6 ask the following question:

Given a start Person and some Tag, find the other Tags that occur together with this Tag on Posts that were created by start Person’s friends and friends of friends (excluding start Person). Return top 10 Tags, and the count of Posts that were created by these Persons, which contain both this Tag and the given Tag.

Below is the GSQL implementation using the pattern match syntax.

CREATE QUERY ic_6(VERTEX<Person> personId, STRING tagName) FOR GRAPH ldbc_snb {
  TYPEDEF tuple<STRING tagName, INT postCount> tagStats;

  SetAccum<VERTEX<Post>> @@postAll;
  SumAccum<INT> @postCount;
  HeapAccum<tagStats>(10, postCount DESC, tagName ASC) @@tagStatsTop;

  vPerson = { personId };
  aggPersonPostTag =
    SELECT t3
    FROM vPerson:s
        -((Person_KNOWS_Person>|<Person_KNOWS_Person)*1..2)-Person:t1
        -(<Post_HAS_CREATOR_Person:e2)-Post:t2
        -(Post_HAS_TAG_Tag>:e3)-Tag:t3
    WHERE s != t1
    ACCUM CASE WHEN t3.name == tagName THEN @@postAll += t2 END;

  vPost = { @@postAll };
  vTag =
    SELECT t
    FROM vPost:s-(Post_HAS_TAG_Tag>:e)-Tag:t
    ACCUM CASE WHEN t.name != tagName THEN t.@postCount += 1 END
    POST-ACCUM @@tagStatsTop += tagStats(t.name, t.@postCount);

  PRINT @@tagStatsTop;
}

SQL Copy

As seen above, the accumulator is applied to the matched path pattern, which gives fine control to the graph query writer on the computation side. 

In conclusion, we see significant ease-of-use improvement with the pattern match syntax extension. We also see the coexistence of accumulation that empowers pattern match without hindering its simplicity. Their marriage guides the user to focus on declarative thinking without losing the power of computing needs. We hope you will enjoy it and please send your valuable feedback to us.

Topics:
graph query languages ,graph database ,database ,graph analytics ,1-hop atomic pattern ,tutorial ,Standard Property Graph Query Language

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}