GSQL for Cypher Users
This article is intended for Neo4j Cypher users who want to learn and understand TigerGraph’s GSQL query language.
Join the DZone community and get the full member experience.
Join For FreeThis is by no means a primer on GSQL. For that, the definitive place to start is TigerGraph’s documentation site. There are also a number of videos on YouTube to learn both GSQL and GraphStudio, which is an excellent graph visualization and exploration tool. It also functions as an Integrated Development Environment (IDE) for GSQL developers, handling everything from schema design to query development and execution.
A common question from the TigerGraph prospects who know Cypher and want to learn GSQL is “Do you have an example of the movie database from Neo4j?”. So, I thought it would be interesting to share an implementation of that movie database in GSQL as a learning resource. The idea is to provide a bridge for existing Cypher users to GSQL.
Both TigerGraph and Neo4j are native graph databases, but architecturally speaking, there are significant differences between the two products. One major difference is that of MPP (Massively Parallel Processing); TigerGraph is a distributed graph that is both vertically and horizontally scalable. TigerGraph database not only stores the data distributedly but also processes it in parallel. The underlying data structures are inherently distributed in design.
Cypher is an interpreted language (like Java) while GSQL can run both in interpreted and compiled mode ( behind the scene C++ and eventually into native Linux executable code). Interpreted mode brings dynamism, while compiled mode gives a higher performance; the choice is yours per your application needs.
GSQL is a high-level programming language that is Turing complete. Cypher is not. You can implement the equivalent of a stored procedure for a graph in GSQL. Classic graph algorithms like cosine similarity, shortest path, community detection, and PageRank can be implemented natively in GSQL, i.e., the graph processing and computation are done inside the database, as the nodes and edges are being traversed in parallel. Cypher, on the other hand, relies on Neo4j's APOC Java library for graph algorithms (i.e., graph processing is done outside the database). A proper comparison of the two products is outside the scope of this article.
Let’s describe how the material is presented here. The Cypher queries are presented in textual form in this article.
The corresponding GSQL queries are available as a GraphStudio solution on this GitHub link.
GraphStudio is TigerGraph’s Visual UI. It is also an IDE for GSQL developers. So you will be able to view the GSQL code as well as run it to see results from the pre-populated movie database.
You can download and install TigerGraph’s Developer Edition when you are ready to write some GSQL queries (there is a free trial). Here is the documentation link. There are also videos on YouTube which are very helpful. Once the TigerGraph server and GraphStudio is up and running, you can download the Tigergraph movie database example from here, and follow the README instructions.
Let’s walk through an example. Here’s a sample query in Cypher, which matches all Movie vertices (nodes).
MATCH (p:Movie) RETURN p
Here is the equivalent query in GSQL:
CREATE QUERY get_all_movies() FOR GRAPH mygraph {
// Cypher equivalent:
// MATCH (p:Movie) RETURN p
m = {Movie.*};
PRINT m;
}
Note the query is inside a defined procedure, get_all_movies()
, and the equivalent Cypher query is referenced in the comments. Anything after the “//” characters are comments.
You can write custom business logic and graph algorithms in procedures just like the one above. This procedure is compiled. On a successful compilation, the procedure is also installed as a REST endpoint on port 9000 and you can access the above query on your browser like this.
http://<tigergraph_host_ip>:9000/query/mygraph/get_all_movies
The result of this query on a browser is JSON formatted output. Running this query in GraphStudio will show the graph result as nodes and edges. You can also see the JSON output in GraphStudio.
By putting graph database operations inside procedures like these, you are essentially defining a REST API layer for your graph database applications.
It is outside the scope of this article to cover GSQL in depth. Rather the intent is to provide a cookbook approach for Cypher users who would like to learn GSQL by example using the familiar Neo4j movie database. It’s highly recommended that you at least learn some basic GSQL prior to this exercise. A good place to start is GSQL 101 in the (corresponding on YouTube).
Here’s how you would run the get_all_movies
query in GraphStudio: Click on Write Queries and in the scroll bar under GSQL queries, select get_all_movies
. The procedure and the GSQL code appear on the right.
Click on Run at the top to run the query and results appear in the bottom panel where you can view the graph visualization of the results. You can also select View JSON Result for the results in JSON format.
The following is a list of Cypher queries from the Neo4j movie database. If you have used that database before in Neo4j’s tutorials, these queries should be very familiar to you. The equivalent GSQL procedures are listed.
1. Retrieve all nodes from the database.
Cypher |
GSQL procedure in GraphStudio |
|
|
2. Write a query to retrieve all Person nodes. Write a query to retrieve all Movie nodes.
Cypher |
GSQL procedure in GraphStudio |
|
|
|
|
3. Retrieve all movies that were released in a specific year.
Cypher |
GSQL procedure in GraphStudio |
|
|
4. Retrieve all Movies released in a specific year, returning their titles.
Cypher |
GSQL procedure in GraphStudio |
|
|
5. Display title, released, and tagline values for every Movie node in the graph.
Cypher |
GSQL procedure in GraphStudio |
|
|
6. Retrieve all people who wrote the movie Speed Racer.
Cypher |
GSQL procedure in GraphStudio |
|
|
7. Retrieve all movies that are connected to the person, Tom Hanks.
Cypher |
GSQL procedure in GraphStudio |
|
|
8. Retrieve information about the relationships Tom Hanks had with the set of movies retrieved earlier.
There’s no need to write a GSQL query for this. In the results panel from the previous query, simply double-click on any of the movie nodes (red) returned. This expands to show all connected Persons for the selected movie. Look for the “Tom Hanks” Person node, and double click on it to show connections to other movies returned.
9. Retrieve all movies that Tom Cruise acted in.
Cypher |
GSQL procedure in GraphStudio |
|
|
10. Retrieve all movies that Tom Cruise acted in.
Cypher |
GSQL procedure in GraphStudio |
|
|
11. Retrieve all actors that were born in the 70s.
Cypher |
GSQL procedure in GraphStudio |
|
|
12. Retrieve the actors who acted in the movie The Matrix who were born after 1960.
Cypher |
GSQL procedure in GraphStudio |
|
|
13. Retrieve all actors whose name begins with James.
Cypher |
GSQL procedure in GraphStudio |
|
|
14. Retrieve all REVIEW relationships from the graph where the summary of the review contains the string "fun," returning the movie title reviewed and the rating and summary of the relationship.
Cypher |
GSQL procedure in GraphStudio |
|
|
15. Retrieve all people who have produced a movie, but have not directed a movie.
Cypher |
GSQL procedure in GraphStudio |
|
|
16. Retrieve the movies and their actors where one of the actors also directed the movie.
Cypher |
GSQL procedure in GraphStudio |
|
|
17. Retrieve all movies that were released in the years 2000, 2004, 2008, returning their titles and release years.
Cypher |
GSQL procedure in GraphStudio |
|
|
18. Retrieve the movies that have an actor's role that is the name of the movie.
Cypher |
GSQL procedure in GraphStudio |
|
|
19. Write a query that retrieves all movies that Gene Hackman has acted it, along with the directors of the movies. In addition, retrieve the actors that acted in the same movies as Gene Hackman.
Cypher |
GSQL procedure in GraphStudio |
|
|
20. Retrieve all nodes that the person named James Thompson directly has the FOLLOWS relationship in either direction.
Cypher |
GSQL procedure in GraphStudio |
|
|
21. Modify the previous query to retrieve nodes that are exactly three hops away.
Cypher |
GSQL procedure in GraphStudio |
|
|
22. Modify the previous query to retrieve nodes that are one and two hops away.
Cypher |
GSQL procedure in GraphStudio |
|
|
23. Modify the previous query to retrieve particular nodes that are connected no matter how many hops are required.
Cypher |
GSQL procedure in GraphStudio |
|
|
24. Write a query to retrieve all people in the graph whose name begins with Tom and optionally retrieve all people named Tom who directed a movie.
Cypher |
GSQL procedure in GraphStudio |
|
|
25. Retrieve actors and the movies they have acted in, returning each actor’s name and the list of movies they acted in.
Cypher |
GSQL procedure in GraphStudio |
|
|
26. Retrieve all movies that Tom Cruise has acted in and the co-actors that acted in the same movie, returning the movie title and the list of co-actors that Tom Cruise worked with.
Cypher |
GSQL procedure in GraphStudio |
|
|
27. Retrieve all people who reviewed a movie, returning the list of reviewers and how many reviewers reviewed the movie.
Cypher |
GSQL procedure in GraphStudio |
|
|
28. Retrieve all directors, their movies, and people who acted in the movies, returning the name of the director, the number of actors the director has worked with, and the list of actors.
Cypher |
GSQL procedure in GraphStudio |
|
|
29. Retrieve the actors who have acted in exactly five movies, returning the name of the actor, and the list of movies for that actor. There are 2 versions in GSQL.
Cypher |
GSQL procedure in GraphStudio |
|
|
30. Retrieve the movies that have at least 2 directors, and optionally the names of people who reviewed the movies.
Cypher |
GSQL procedure in GraphStudio |
|
|
Next Steps
I hope this article helps bridge the gap between Cypher and GSQL users and brings the two communities closer together.
Like any language, GSQL continues to evolve. At the time of this writing, it has been announced that multi-hop patterns in the FROM clause will be added to the language. This will make it simpler and more intuitive to describe multi-hop graph pattern matching in a single query block (SELECT-FROM-WHERE) instead of several individual blocks.
Opinions expressed by DZone contributors are their own.
Comments