Over a million developers have joined DZone.

Neo4j & Cypher: Finding Movies by Decade

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

I was recently asked how to find the number of movies produced per decade in the movie data set that comes with the Neo4j browser and can be imported with the following command:

:play movies

We want to get one row per decade and have a count alongside so the easiest way is to start with one decade and build from there.

MATCH (movie:Movie)
WHERE movie.released >= 1990 and movie.released <= 1999
RETURN 1990 + "-" + 1999 as years, count(movie) AS movies
ORDER BY years

Note that we’re doing a label scan of all nodes of type Movie as there are no indexes for range queries. In this case it’s fine as we have few movies but If we had 100s of thousands of movies then we’d want to optimise the WHERE clause to make use of an IN which would then use any indexes.

If we run the query we get the following result:

==> +----------------------+
==> | years       | movies |
==> +----------------------+
==> | "1990-1999" | 21     |
==> +----------------------+
==> 1 row

Let’s pull out the start and end years so they’re explicitly named:

WITH 1990 AS startDecade, 1999 AS endDecade
MATCH (movie:Movie)
WHERE movie.released >= startDecade and movie.released <= endDecade
RETURN startDecade + "-" + endDecade as years, count(movie)
ORDER BY years

Now we need to create a collection of start and end years so we can return more than one. We can use the UNWIND function to take a collection of decades and run them through the rest of the query:

UNWIND [{start: 1970, end: 1979}, {start: 1980, end: 1989}, {start: 1980, end: 1989}, {start: 1990, end: 1999}, {start: 2000, end: 2009}, {start: 2010, end: 2019}] AS row
WITH row.start AS startDecade, row.end AS endDecade
MATCH (movie:Movie)
WHERE movie.released >= startDecade and movie.released <= endDecade
RETURN startDecade + "-" + endDecade as years, count(movie)
ORDER BY years
==> +----------------------------+
==> | years       | count(movie) |
==> +----------------------------+
==> | "1970-1979" | 2            |
==> | "1980-1989" | 2            |
==> | "1990-1999" | 21           |
==> | "2000-2009" | 13           |
==> | "2010-2019" | 1            |
==> +----------------------------+
==> 5 rows

Alistair pointed out that we can simplify this even further by using the RANGE function:

UNWIND range(1970,2010,10) as startDecade
WITH startDecade, startDecade + 9 as endDecade
MATCH (movie:Movie)
WHERE movie.released >= startDecade and movie.released <= endDecade
RETURN startDecade + "-" + endDecade as years, count(movie)
ORDER BY years

And here’s a graph gist for you to play with.

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}