Neo4j & Cypher: Finding Movies by Decade
Join the DZone community and get the full member experience.
Join For FreeI was recently asked how to find the number of movies produced per decade in the movie data set that comes with the Neo4j browser and can be imported with the following command:
:play movies
We want to get one row per decade and have a count alongside so the easiest way is to start with one decade and build from there.
MATCH (movie:Movie) WHERE movie.released >= 1990 and movie.released <= 1999 RETURN 1990 + "-" + 1999 as years, count(movie) AS movies ORDER BY years
Note that we’re doing a label scan of all nodes of type Movie as there are no indexes for range queries. In this case it’s fine as we have few movies but If we had 100s of thousands of movies then we’d want to optimise the WHERE clause to make use of an IN which would then use any indexes.
If we run the query we get the following result:
==> +----------------------+ ==> | years | movies | ==> +----------------------+ ==> | "1990-1999" | 21 | ==> +----------------------+ ==> 1 row
Let’s pull out the start and end years so they’re explicitly named:
WITH 1990 AS startDecade, 1999 AS endDecade MATCH (movie:Movie) WHERE movie.released >= startDecade and movie.released <= endDecade RETURN startDecade + "-" + endDecade as years, count(movie) ORDER BY years
Now we need to create a collection of start and end years so we can return more than one. We can use the UNWIND function to take a collection of decades and run them through the rest of the query:
UNWIND [{start: 1970, end: 1979}, {start: 1980, end: 1989}, {start: 1980, end: 1989}, {start: 1990, end: 1999}, {start: 2000, end: 2009}, {start: 2010, end: 2019}] AS row WITH row.start AS startDecade, row.end AS endDecade MATCH (movie:Movie) WHERE movie.released >= startDecade and movie.released <= endDecade RETURN startDecade + "-" + endDecade as years, count(movie) ORDER BY years
==> +----------------------------+ ==> | years | count(movie) | ==> +----------------------------+ ==> | "1970-1979" | 2 | ==> | "1980-1989" | 2 | ==> | "1990-1999" | 21 | ==> | "2000-2009" | 13 | ==> | "2010-2019" | 1 | ==> +----------------------------+ ==> 5 rows
Alistair pointed out that we can simplify this even further by using the RANGE function:
UNWIND range(1970,2010,10) as startDecade WITH startDecade, startDecade + 9 as endDecade MATCH (movie:Movie) WHERE movie.released >= startDecade and movie.released <= endDecade RETURN startDecade + "-" + endDecade as years, count(movie) ORDER BY years
And here’s a graph gist for you to play with.
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Send Email Using Spring Boot (SMTP Integration)
-
4 Expert Tips for High Availability and Disaster Recovery of Your Cloud Deployment
-
Database Integration Tests With Spring Boot and Testcontainers
-
Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
Comments