GSQL for Cypher Users

This article is intended for Neo4j Cypher users who want to learn and understand TigerGraph’s GSQL query language.

Victor Moey

Updated Jul. 04, 19 · Tutorial

Likes (4)

Comment

Save

7.3K Views

This is by no means a primer on GSQL. For that, the definitive place to start is TigerGraph’s documentation site. There are also a number of videos on YouTube to learn both GSQL and GraphStudio, which is an excellent graph visualization and exploration tool. It also functions as an Integrated Development Environment (IDE) for GSQL developers, handling everything from schema design to query development and execution.

A common question from the TigerGraph prospects who know Cypher and want to learn GSQL is “Do you have an example of the movie database from Neo4j?”. So, I thought it would be interesting to share an implementation of that movie database in GSQL as a learning resource. The idea is to provide a bridge for existing Cypher users to GSQL.

Both TigerGraph and Neo4j are native graph databases, but architecturally speaking, there are significant differences between the two products. One major difference is that of MPP (Massively Parallel Processing); TigerGraph is a distributed graph that is both vertically and horizontally scalable. TigerGraph database not only stores the data distributedly but also processes it in parallel. The underlying data structures are inherently distributed in design.

Cypher is an interpreted language (like Java) while GSQL can run both in interpreted and compiled mode ( behind the scene C++ and eventually into native Linux executable code). Interpreted mode brings dynamism, while compiled mode gives a higher performance; the choice is yours per your application needs.

GSQL is a high-level programming language that is Turing complete. Cypher is not. You can implement the equivalent of a stored procedure for a graph in GSQL. Classic graph algorithms like cosine similarity, shortest path, community detection, and PageRank can be implemented natively in GSQL, i.e., the graph processing and computation are done inside the database, as the nodes and edges are being traversed in parallel. Cypher, on the other hand, relies on Neo4j's APOC Java library for graph algorithms (i.e., graph processing is done outside the database). A proper comparison of the two products is outside the scope of this article.

Let’s describe how the material is presented here. The Cypher queries are presented in textual form in this article.

The corresponding GSQL queries are available as a GraphStudio solution on this GitHub link.

GraphStudio is TigerGraph’s Visual UI. It is also an IDE for GSQL developers. So you will be able to view the GSQL code as well as run it to see results from the pre-populated movie database.

You can download and install TigerGraph’s Developer Edition when you are ready to write some GSQL queries (there is a free trial). Here is the documentation link. There are also videos on YouTube which are very helpful. Once the TigerGraph server and GraphStudio is up and running, you can download the Tigergraph movie database example from here, and follow the README instructions.

Let’s walk through an example. Here’s a sample query in Cypher, which matches all Movie vertices (nodes).

MATCH (p:Movie) RETURN p

Here is the equivalent query in GSQL:

CREATE QUERY get_all_movies() FOR GRAPH mygraph {
  // Cypher equivalent: 
  //     MATCH (p:Movie) RETURN p
   m = {Movie.*};
   PRINT m;
 }

Note the query is inside a defined procedure, get_all_movies(), and the equivalent Cypher query is referenced in the comments. Anything after the “//” characters are comments.

You can write custom business logic and graph algorithms in procedures just like the one above. This procedure is compiled. On a successful compilation, the procedure is also installed as a REST endpoint on port 9000 and you can access the above query on your browser like this.

http://<tigergraph_host_ip>:9000/query/mygraph/get_all_movies

The result of this query on a browser is JSON formatted output. Running this query in GraphStudio will show the graph result as nodes and edges. You can also see the JSON output in GraphStudio.

By putting graph database operations inside procedures like these, you are essentially defining a REST API layer for your graph database applications.

It is outside the scope of this article to cover GSQL in depth. Rather the intent is to provide a cookbook approach for Cypher users who would like to learn GSQL by example using the familiar Neo4j movie database. It’s highly recommended that you at least learn some basic GSQL prior to this exercise. A good place to start is GSQL 101 in the (corresponding on YouTube).

Here’s how you would run the get_all_movies query in GraphStudio: Click on Write Queries and in the scroll bar under GSQL queries, select get_all_movies. The procedure and the GSQL code appear on the right.

Click on Run at the top to run the query and results appear in the bottom panel where you can view the graph visualization of the results. You can also select View JSON Result for the results in JSON format.

The following is a list of Cypher queries from the Neo4j movie database. If you have used that database before in Neo4j’s tutorials, these queries should be very familiar to you. The equivalent GSQL procedures are listed.

1. Retrieve all nodes from the database.

Cypher	GSQL procedure in GraphStudio
`MATCH (n) RETURN n`	`get_whole_graph()`

2. Write a query to retrieve all Person nodes. Write a query to retrieve all Movie nodes.

Cypher	GSQL procedure in GraphStudio
`MATCH (p:Person) RETURN p`	`get_all_persons()`
`MATCH (m:Movie) RETURN m`	`get_all_movies()`

3. Retrieve all movies that were released in a specific year.

Cypher	GSQL procedure in GraphStudio
`MATCH (m:Movie {released:2003}) RETURN m`	`get_movies_for_year (UINT year)`

4. Retrieve all Movies released in a specific year, returning their titles.

Cypher	GSQL procedure in GraphStudio
`MATCH (m:Movie {released: 2006}) RETURN m.title`	`get_movie_titles_for_year (UINT year)`

5. Display title, released, and tagline values for every Movie node in the graph.

Cypher	GSQL procedure in GraphStudio
`MATCH (m:Movie) RETURN m.title, m.released, m.tagline`	`get_all_movie_details()`

6. Retrieve all people who wrote the movie Speed Racer.

Cypher	GSQL procedure in GraphStudio
`MATCH (p:Person)-[:WROTE]->(:Movie {title: 'Speed Racer'}) RETURN p.name`	`get_persons_who_wrote_movie(Vertex<Movie>mov)`

7. Retrieve all movies that are connected to the person, Tom Hanks.

Cypher	GSQL procedure in GraphStudio
`MATCH (m:Movie)<--(:Person {name: 'Tom Hanks'}) RETURN m.title`	`get_movies_related_to_person(Vertex<Person>p)`

8. Retrieve information about the relationships Tom Hanks had with the set of movies retrieved earlier.

There’s no need to write a GSQL query for this. In the results panel from the previous query, simply double-click on any of the movie nodes (red) returned. This expands to show all connected Persons for the selected movie. Look for the “Tom Hanks” Person node, and double click on it to show connections to other movies returned.

9. Retrieve all movies that Tom Cruise acted in.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Tom Cruise' 
RETURN m.title as Movie

print_movies_for_actor ( Vertex<Person> actor)

10. Retrieve all movies that Tom Cruise acted in.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Tom Cruise' RETURN m.title as Movie

print_movies_for_actor (Vertex<Person> actor)

11. Retrieve all actors that were born in the 70s.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)
WHERE a.born >= 1970 AND a.born < 1980
RETURN a.name as Name, a.born as `Year Born`

get_actors_born_in (UINT from_year, UINT to_year)

12. Retrieve the actors who acted in the movie The Matrix who were born after 1960.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.born > 1960 AND m.title = 'The Matrix'
RETURN a.name as Name, a.born as `Year Born`

get_actors_for_movie_born_after (Vertex<Movie> movie, UINT after_year)

13. Retrieve all actors whose name begins with James.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(:Movie)
WHERE a.name STARTS WITH 'James'
RETURN a.name

get_actors_whose_name_begins_with (String pattern)

14. Retrieve all REVIEW relationships from the graph where the summary of the review contains the string "fun," returning the movie title reviewed and the rating and summary of the relationship.

Cypher

GSQL procedure in GraphStudio

MATCH (:Person)-[r:REVIEWED]->(m:Movie)
WHERE toLower(r.summary) CONTAINS 'fun'
RETURN m.title as Movie, r.summary as Review, r.rating as Rating

get_reviews_with_str (String str)

15. Retrieve all people who have produced a movie, but have not directed a movie.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:PRODUCED]->(m:Movie)
WHERE NOT ((a)-[:DIRECTED]->(:Movie))
RETURN a.name, m.title

get_producers_who_are_not_directors()

16. Retrieve the movies and their actors where one of the actors also directed the movie.

Cypher

GSQL procedure in GraphStudio

MATCH (a1:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(a2:Person)
WHERE exists( (a2)-[:DIRECTED]->(m) )
RETURN a1.name as Actor, a2.name as `Actor/Director`, m.title as Movie

get_movies_with_actor_directors()

17. Retrieve all movies that were released in the years 2000, 2004, 2008, returning their titles and release years.

Cypher

GSQL procedure in GraphStudio

MATCH (m:Movie)
WHERE m.released in [2000, 2004, 2008]
RETURN m.title, m.released

get_movies_release_in(Set<INT> years)

18. Retrieve the movies that have an actor's role that is the name of the movie.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[r:ACTED_IN]->(m:Movie)
WHERE m.title in r.roles
RETURN m.title as Movie, a.name as Actor

get_movie_with_same_name_as_role()

19. Write a query that retrieves all movies that Gene Hackman has acted it, along with the directors of the movies. In addition, retrieve the actors that acted in the same movies as Gene Hackman.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person),
(a2:Person)-[:ACTED_IN]->(m) WHERE a.name = 'Gene Hackman' 
RETURN m.title as movie, d.name AS director, a2.name AS `co-actors`

get_movies_for_actor_expand (Vertex<Person> actor)

20. Retrieve all nodes that the person named James Thompson directly has the FOLLOWS relationship in either direction.

Cypher

GSQL procedure in GraphStudio

MATCH (p1:Person)-[:FOLLOWS]-(p2:Person)
WHERE p1.name = 'James Thompson' RETURN p1, p2

get_followers_for(Vertex<Person> p)

21. Modify the previous query to retrieve nodes that are exactly three hops away.

Cypher

GSQL procedure in GraphStudio

MATCH (p1:Person)-[:FOLLOWS*3]-(p2:Person)
WHERE p1.name = 'James Thompson'
RETURN p1, p2

k_hop(Vertex<Person> p, int k)

22. Modify the previous query to retrieve nodes that are one and two hops away.

Cypher

GSQL procedure in GraphStudio

MATCH (p1:Person)-[:FOLLOWS*1..2]-(p2:Person)
WHERE p1.name = 'James Thompson' RETURN p1, p2

k_hop_inclusive(Vertex<Person> p, int k)

23. Modify the previous query to retrieve particular nodes that are connected no matter how many hops are required.

Cypher

GSQL procedure in GraphStudio

MATCH (p1:Person)-[:FOLLOWS*]-(p2:Person)
WHERE p1.name = 'James Thompson'
RETURN p1, p2

get_connected_subgraph(Vertex<Person> p)

24. Write a query to retrieve all people in the graph whose name begins with Tom and optionally retrieve all people named Tom who directed a movie.

Cypher

GSQL procedure in GraphStudio

MATCH (p:Person) WHERE p.name STARTS WITH 'Tom' 
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN p.name, m.title

get_persons_whose_name_begins_with(String pattern)

25. Retrieve actors and the movies they have acted in, returning each actor’s name and the list of movies they acted in.

Cypher

GSQL procedure in GraphStudio

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name as actor, collect(m.title) AS `movie list`

get_actors_and_movies()

26. Retrieve all movies that Tom Cruise has acted in and the co-actors that acted in the same movie, returning the movie title and the list of co-actors that Tom Cruise worked with.

Cypher	GSQL procedure in GraphStudio
`MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(p2:Person) WHERE p.name ='Tom Cruise'`	`get_movie_and_co_actors(Vertex<Person> actor)`

27. Retrieve all people who reviewed a movie, returning the list of reviewers and how many reviewers reviewed the movie.

Cypher

GSQL procedure in GraphStudio

MATCH (p:Person)-[:REVIEWED]->
(m:Movie)
RETURN m.title as movie, count(p) as numReviews, collect(p.name) as reviewers

get_list_of_reviewers_per_movie()

28. Retrieve all directors, their movies, and people who acted in the movies, returning the name of the director, the number of actors the director has worked with, and the list of actors.

Cypher	GSQL procedure in GraphStudio
MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) RETURN d.name AS director, count(a) AS `number actors`, collect(a.name) AS `actors worked with`	`get_directors_movies_and_actors()`

29. Retrieve the actors who have acted in exactly five movies, returning the name of the actor, and the list of movies for that actor. There are 2 versions in GSQL.

Cypher

GSQL procedure in GraphStudio

MATCH (a:Person)-[:ACTED_IN]->(m:Movie) WITH a, count(a) 
AS numMovies, collect(m.title) AS movies WHERE 
numMovies = 5 
RETURN a.name, movies

get_actors_with_n_movies(UINT n)
//get_actors_with_n_movies2(UINT n)

30. Retrieve the movies that have at least 2 directors, and optionally the names of people who reviewed the movies.

Cypher

GSQL procedure in GraphStudio

MATCH (m:Movie)
WITH m, size((:Person)-[:DIRECTED]->
(m)) AS directors
WHERE directors >= 2
OPTIONAL MATCH (p:Person)-[:REVIEWED]->
(m) RETURN m.title, p.name

get_movies_with_2_directors()

Next Steps

I hope this article helps bridge the gap between Cypher and GSQL users and brings the two communities closer together.

Like any language, GSQL continues to evolve. At the time of this writing, it has been announced that multi-hop patterns in the FROM clause will be added to the language. This will make it simpler and more intuitive to describe multi-hop graph pattern matching in a single query block (SELECT-FROM-WHERE) instead of several individual blocks.

Database Graph (Unix)

Opinions expressed by DZone contributors are their own.

Related

Trending

GSQL for Cypher Users

This article is intended for Neo4j Cypher users who want to learn and understand TigerGraph’s GSQL query language.

Next Steps

Related

Partner Resources