How to Build a Collaborative Filtering Recommender Engine with Memgraph and Cypher
In this tutorial, you will learn how to build a simple movie recommender system leveraging Memgraph, Cypher, and a user-based collaborative filtering algorithm.
Join the DZone community and get the full member experience.
Join For FreeIntroduction
A recommendation engine is a system that tries to suggest relevant items to users. These could be movies (e.g Netflix), products (e.g Amazon), flights (e.g Skyscanner), etc. Recommendation engines have become a key component in today’s online-first world and if engineered properly, they can help significantly increase revenue for commercial applications.
Although many different approaches exist to building a recommendation engine, in this tutorial we will be focusing on one of the most widely used ones, collaborative filtering. We will be using a movie dataset to build a simple movie recommender system leveraging Memgraph and Cypher.
Prerequisites
To follow along, you will need the following:
- A local installation of Memgraph. You can refer to the Memgraph documentation.
- Basic knowledge of the Cypher query language.
- [OPTIONAL] A local installation of Memgraph Lab if you would like to visualize your results.
- [OPTIONAL] If you don’t want to install Memgraph & Memgraph Lab on your own you can create a free account on Memgraph Cloud.
What Is a Collaborative Filtering Recommendation Engine?
Collaborative filtering is a method for building recommendation engines that relies on past interactions between users and items to generate new recommendations. For example, when a recommender system is trying to recommend item “i” to a user “x”, it will first look for users who share similar rating patterns and then use the ratings made by those “similar” users to predict the rating for item “i” for user “x”. This is known as user-based collaborative filtering.
Another approach could be item-based collaborative filtering(e.g. people who bought this also bought this) which calculates the similarity between items based users’ ratings of those items. This method was pioneered by Amazon in 1998.
In this tutorial, we will focus on user-based collaborative filtering.
Step 1 — Building a Movie Recommender Data Model
The first step is to build our data model. For this example, we’ll be using the MovieLens dataset containing a few hundred movies and users. Our model will contain three different types of data: Movie
, User
and Genre
. Movies have two properties, id
and title
. Users also have two properties id
and name
. Finally, genres will only have one property, name
.
Each movie will be connected with an edge of a type :ofGenre
to different genres. Users will be connected to movies by an edge of type :Rating
that will have one property of type score
representing a rating between 0 and 5 given by a user to a movie.
Now that we have our model, we’re ready to import the MovieLens data into Memgraph.
Step 2 — Importing Data Into Memgraph
The simplest way to import our dataset into Memgraph is by using Cypher queries.
Before starting our import, in order to improve our search performance when looking for specific movies, users, and genres, we will create an index on the id
property of user and movie vertices, and the name
property of genre vertices.
[NOTE] A database index essentially creates a redundant copy of some of the data in the database to improve the efficiency of searches for the indexed data. However, this comes at the cost of additional storage space and more writes, so deciding what to index and what not to index is an important decision.
To create our index on the id
property of user vertices, we will use the following Cypher query:
CREATE INDEX ON :User(id);
To create our index on the id
property of movie vertices, we will use the following Cypher query:
xxxxxxxxxx
CREATE INDEX ON :Movie(id);
Finally, To create our index on the name
property of genre vertices, we will use the following Cypher query:
xxxxxxxxxx
CREATE INDEX ON :Genre(name);
Now that our indexes are created, let’s import our data.
First, we will import users. To do this, we will use this Cypher query. This query will create vertices with label User
and properties id
, and name
.
Second, we will import genres using this query. This query will create vertices with label Genre
and property name
.
Third, we will import movies and connect them to genres. To do this, we will use this Cypher Query. As you can see this query is slightly different from the two other ones. The query first creates a vertex with label Movie
and properties id
and title
. It then matches the movie vertex with the appropriate genre vertex and creates an edge with lable :ofGenre
between the two.
Finally, we will import user ratings using this Cypher query. As you can see, the query is similar to the last one. It matches two vertices, in this case Users
and Movies
, and creates a relationship between the two.
Now that all the data is stored inside Memgraph, we can start building our collaborative filtering query.
Step 3 — Building a Collaborative Filtering Algorithm
In this step, we will create a new user named Alice that has already rated a few movies and we will build a user-based collaborative filtering algorithm that will recommend him a few movies to watch next.
To create the user, the following query has to be executed:
xxxxxxxxxx
CREATE (:User {id:1000, name:"Alice"});
Now that we have our target user, let’s generate his watch history which will contain ratings he assigned to movies he watched. To do this, you will have to execute the following query:
xxxxxxxxxx
MATCH (u:User{id:1000}), (m:Movie{title:"Trois couleurs : Rouge"})
MERGE (u)-[:Rating{score:3.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"20,000 Leagues Under the Sea"})
MERGE (u)-[:Rating{score:1.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Star Trek: Generations"})
MERGE (u)-[:Rating{score:0.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Rebecca"})
MERGE (u)-[:Rating{score:3.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"The 39 Steps"})
MERGE (u)-[:Rating{score:4.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Faster, Pussycat! Kill! Kill!"})
MERGE (u)-[:Rating{score:3.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Once Were Warriors"})
MERGE (u)-[:Rating{score:3.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Sleepless in Seattle"})
MERGE (u)-[:Rating{score:4.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Don Juan DeMarco"})
MERGE (u)-[:Rating{score:4.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Jack & Sarah"})
MERGE (u)-[:Rating{score:1.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Mr. Holland's Opus"})
MERGE (u)-[:Rating{score:2.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"The Getaway"})
MERGE (u)-[:Rating{score:3.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Color of Night"})
MERGE (u)-[:Rating{score:4.0}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Reality Bites"})
MERGE (u)-[:Rating{score:2.5}]-(m);
MATCH (u:User{id:1000}), (m:Movie{title:"Notorious"})
MERGE (u)-[:Rating{score:3.5}]-(m);
This query matches our target user with a movie he watched and creates a relationships :Rating
between the two with a score between 0 and 5. As you can see, we have used a new Cypher clause MERGE
. The MERGE
clause acts as a combination of MATCH
and CREATE
. It tries to find the pattern in the graph and if it does, nothing is created, if it doesn’t, it creates the pattern.
Now that we know which movies Alice watched and how he scored them, we’re ready to recommend her a few other movies to watch next. To do so, we will first need to identify other users in our graph who watched and scored the same movies as Alice. Then, we will recommend to Alice other movies those “like-minded” users have liked which she didn’t watch yet. To do this, we will use the following Cypher query:
xxxxxxxxxx
MATCH (u:User {id:1000})-[r:Rating]-(m:Movie)-[other_r:Rating]-(other:User)
WITH other.id AS other_id,
AVG(ABS(r.score-other_r.score)) AS similarity,
COUNT(*) AS similar_user_count
WHERE similar_user_count > 2
WITH other_id ORDER BY similarity LIMIT 10
WITH COLLECT(other_id) AS similar_user_set
MATCH (some_movie:Movie)-[fellow_rate:Rating]-(fellow_user:User)
WHERE fellow_user.id IN similar_user_set
WITH some_movie, AVG(fellow_rate.score) AS avg_score
RETURN some_movie.title AS title,
avg_score ORDER BY avg_score DESC;
This query is a little more complex than what we’ve seen to far, so let’s break it down.
In the first part of the query, the MATCH
clause finds our target user with id 1000(aka Alice) and expands to all other users that watched and rated at least 1 same movie. Once all other users are collected, we compute similarities as the average distance between the target user score and some other user score on the same set of movies. The parameter similarUserCount limit (2) limit is used for filtering users who have at least 2 movies in common with the target user.
xxxxxxxxxx
MATCH (u:User {id:1000})-[r:Rating]-(m:Movie)-[other_r:Rating]-(other:User)
WITH other.id AS other_id,
AVG(ABS(r.score-other_r.score)) AS similarity,
COUNT(*) AS similar_user_count
WHERE similar_user_count > 2
The second part of the query simply takes the 10 (or less) most similar users and puts them in a list.
xxxxxxxxxx
WITH other_id ORDER BY similarity LIMIT 10
WITH COLLECT(other_id) AS similar_user_set
The last part of the query starts with the most similar users (fellow_user
, similar_user_set
from the second part) and searches for movies watched by these users which have the highest average score. The RETURN
clause then returns the titles and average scores ordered by descending order.
xxxxxxxxxx
MATCH (some_movie:Movie)-[fellow_rate:Rating]-(fellow_user:User)
WHERE fellow_user.id IN similar_user_set
WITH some_movie, AVG(fellow_rate.score) AS avg_score
RETURN some_movie.title AS title, avg_score ORDER BY avg_score DESC;
As we can see below, the top two movies we should recommend our target user, Alice, is Space Jam, and Mr. Smith Goes to Washington.
xxxxxxxxxx
+--------------------------------+-----------+
| title | avg_score |
+--------------------------------+-----------+
| "Space Jam" | 5 |
| "Mr. Smith Goes to Washington" | 5 |
| "The 39 Steps" | 4.71429 |
| "Mission: Impossible" | 4.5 |
| "Dead Man" | 4.5 |
| "Sleepless in Seattle" | 4.25 |
| "Nell" | 4.16667 |
| "Don Juan DeMarco" | 4 |
| "Romeo Is Bleeding" | 4 |
| "A Time to Kill" | 4 |
| "The Godfather" | 4 |
| ... | ... |
Conclusion
Congratulations! You just built a simple recommendation engine by leveraging Memgraph and Cypher. Along the way, you learned how to use the basic variant of a collaborative filtering algorithm to deliver user-based recommendations. If you are interested in more tutorials like this one, you could check out our tutorial on how to build a route planning application using Breadth-First Search and Dijkstra’s algorithm.
Published at DZone with permission of Marko Budiselic. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Health Check Response Format for HTTP APIs
-
Comparing Cloud Hosting vs. Self Hosting
-
What ChatGPT Needs Is Context
-
Operator Overloading in Java
Comments