{{announcement.body}}
{{announcement.title}}

How to Build a Collaborative Filtering Recommender Engine with Memgraph and Cypher

DZone 's Guide to

How to Build a Collaborative Filtering Recommender Engine with Memgraph and Cypher

In this tutorial, you will learn how to build a simple movie recommender system leveraging Memgraph, Cypher, and a user-based collaborative filtering algorithm.

· Database Zone ·
Free Resource

Introduction

A recommendation engine is a system that tries to suggest relevant items to users. These could be movies (e.g Netflix), products (e.g Amazon), flights (e.g Skyscanner), etc. Recommendation engines have become a key component in today’s online-first world and if engineered properly, they can help significantly increase revenue for commercial applications.

Although many different approaches exist to building a recommendation engine, in this tutorial we will be focusing on one of the most widely used ones, collaborative filtering. We will be using a movie dataset to build a simple movie recommender system leveraging Memgraph and Cypher.

Prerequisites

To follow along, you will need the following:

What Is a Collaborative Filtering Recommendation Engine?

Collaborative filtering is a method for building recommendation engines that relies on past interactions between users and items to generate new recommendations. For example, when a recommender system is trying to recommend item “i” to a user “x”, it will first look for users who share similar rating patterns and then use the ratings made by those “similar” users to predict the rating for item “i” for user “x”. This is known as user-based collaborative filtering.

Another approach could be item-based collaborative filtering(e.g. people who bought this also bought this) which calculates the similarity between items based users’ ratings of those items. This method was pioneered by Amazon in 1998.

In this tutorial, we will focus on user-based collaborative filtering.

Step 1 — Building a Movie Recommender Data Model

The first step is to build our data model. For this example, we’ll be using the MovieLens dataset containing a few hundred movies and users. Our model will contain three different types of data: MovieUserand Genre. Movies have two properties, id and title. Users also have two properties id and name. Finally, genres will only have one property, name.

Each movie will be connected with an edge of a type :ofGenre to different genres. Users will be connected to movies by an edge of type :Rating that will have one property of type score representing a rating between 0 and 5 given by a user to a movie.

Now that we have our model, we’re ready to import the MovieLens data into Memgraph.

Step 2 — Importing Data Into Memgraph

The simplest way to import our dataset into Memgraph is by using Cypher queries.

Before starting our import, in order to improve our search performance when looking for specific movies, users, and genres, we will create an index on the id property of user and movie vertices, and the name property of genre vertices.

[NOTE] A database index essentially creates a redundant copy of some of the data in the database to improve the efficiency of searches for the indexed data. However, this comes at the cost of additional storage space and more writes, so deciding what to index and what not to index is an important decision.

To create our index on the id property of user vertices, we will use the following Cypher query:

Java
 




x


 
1
CREATE INDEX ON :User(id);



To create our index on the id property of movie vertices, we will use the following Cypher query:

Java
 




xxxxxxxxxx
1


 
1
CREATE INDEX ON :Movie(id);



Finally, To create our index on the name property of genre vertices, we will use the following Cypher query:

Java
 




xxxxxxxxxx
1


 
1
CREATE INDEX ON :Genre(name);



Now that our indexes are created, let’s import our data.

First, we will import users. To do this, we will use this Cypher query. This query will create vertices with label User and properties id, and name.

Second, we will import genres using this query. This query will create vertices with label Genre and property name.

Third, we will import movies and connect them to genres. To do this, we will use this Cypher Query. As you can see this query is slightly different from the two other ones. The query first creates a vertex with label Movie and properties id and title. It then matches the movie vertex with the appropriate genre vertex and creates an edge with lable :ofGenre between the two.

Finally, we will import user ratings using this Cypher query. As you can see, the query is similar to the last one. It matches two vertices, in this case Users and Movies, and creates a relationship between the two.

Now that all the data is stored inside Memgraph, we can start building our collaborative filtering query.

Step 3 — Building a Collaborative Filtering Algorithm

In this step, we will create a new user named Alice that has already rated a few movies and we will build a user-based collaborative filtering algorithm that will recommend him a few movies to watch next.

To create the user, the following query has to be executed:

Java
 




xxxxxxxxxx
1


1
CREATE (:User {id:1000, name:"Alice"});



Now that we have our target user, let’s generate his watch history which will contain ratings he assigned to movies he watched. To do this, you will have to execute the following query:

Java
 




xxxxxxxxxx
1
30


 
1
MATCH (u:User{id:1000}), (m:Movie{title:"Trois couleurs : Rouge"})
2
MERGE (u)-[:Rating{score:3.0}]-(m);
3
MATCH (u:User{id:1000}), (m:Movie{title:"20,000 Leagues Under the Sea"})
4
MERGE (u)-[:Rating{score:1.0}]-(m);
5
MATCH (u:User{id:1000}), (m:Movie{title:"Star Trek: Generations"})
6
MERGE (u)-[:Rating{score:0.5}]-(m);
7
MATCH (u:User{id:1000}), (m:Movie{title:"Rebecca"})
8
MERGE (u)-[:Rating{score:3.0}]-(m);
9
MATCH (u:User{id:1000}), (m:Movie{title:"The 39 Steps"})
10
MERGE (u)-[:Rating{score:4.5}]-(m);
11
MATCH (u:User{id:1000}), (m:Movie{title:"Faster, Pussycat! Kill! Kill!"})
12
MERGE (u)-[:Rating{score:3.5}]-(m);
13
MATCH (u:User{id:1000}), (m:Movie{title:"Once Were Warriors"})
14
MERGE (u)-[:Rating{score:3.5}]-(m);
15
MATCH (u:User{id:1000}), (m:Movie{title:"Sleepless in Seattle"})
16
MERGE (u)-[:Rating{score:4.0}]-(m);
17
MATCH (u:User{id:1000}), (m:Movie{title:"Don Juan DeMarco"})
18
MERGE (u)-[:Rating{score:4.0}]-(m);
19
MATCH (u:User{id:1000}), (m:Movie{title:"Jack & Sarah"})
20
MERGE (u)-[:Rating{score:1.5}]-(m);
21
MATCH (u:User{id:1000}), (m:Movie{title:"Mr. Holland's Opus"})
22
MERGE (u)-[:Rating{score:2.0}]-(m);
23
MATCH (u:User{id:1000}), (m:Movie{title:"The Getaway"})
24
MERGE (u)-[:Rating{score:3.0}]-(m);
25
MATCH (u:User{id:1000}), (m:Movie{title:"Color of Night"})
26
MERGE (u)-[:Rating{score:4.0}]-(m);
27
MATCH (u:User{id:1000}), (m:Movie{title:"Reality Bites"})
28
MERGE (u)-[:Rating{score:2.5}]-(m);
29
MATCH (u:User{id:1000}), (m:Movie{title:"Notorious"})
30
MERGE (u)-[:Rating{score:3.5}]-(m);



This query matches our target user with a movie he watched and creates a relationships  :Rating between the two with a score between 0 and 5. As you can see, we have used a new Cypher clause MERGE. The MERGE clause acts as a combination of MATCH and CREATE. It tries to find the pattern in the graph and if it does, nothing is created, if it doesn’t, it creates the pattern.

Now that we know which movies Alice watched and how he scored them, we’re ready to recommend her a few other movies to watch next. To do so, we will first need to identify other users in our graph who watched and scored the same movies as Alice. Then, we will recommend to Alice other movies those “like-minded” users have liked which she didn’t watch yet. To do this, we will use the following Cypher query:

Java
 




xxxxxxxxxx
1
12


 
1
MATCH (u:User {id:1000})-[r:Rating]-(m:Movie)-[other_r:Rating]-(other:User)
2
WITH other.id AS other_id,
3
     AVG(ABS(r.score-other_r.score)) AS similarity,
4
     COUNT(*) AS similar_user_count
5
  WHERE similar_user_count > 2
6
WITH other_id ORDER BY similarity LIMIT 10
7
WITH COLLECT(other_id) AS similar_user_set
8
MATCH (some_movie:Movie)-[fellow_rate:Rating]-(fellow_user:User)
9
  WHERE fellow_user.id IN similar_user_set
10
WITH some_movie, AVG(fellow_rate.score) AS avg_score
11
RETURN some_movie.title AS title,
12
       avg_score ORDER BY avg_score DESC;



This query is a little more complex than what we’ve seen to far, so let’s break it down.

In the first part of the query, the MATCH clause finds our target user with id 1000(aka Alice) and expands to all other users that watched and rated at least 1 same movie. Once all other users are collected, we compute similarities as the average distance between the target user score and some other user score on the same set of movies. The parameter similarUserCount limit (2) limit is used for filtering users who have at least 2 movies in common with the target user.

Java
 




xxxxxxxxxx
1


 
1
MATCH (u:User {id:1000})-[r:Rating]-(m:Movie)-[other_r:Rating]-(other:User)
2
WITH other.id AS other_id,
3
     AVG(ABS(r.score-other_r.score)) AS similarity,
4
     COUNT(*) AS similar_user_count
5
  WHERE similar_user_count > 2



The second part of the query simply takes the 10 (or less) most similar users and puts them in a list.

Java
 




xxxxxxxxxx
1


 
1
WITH other_id ORDER BY similarity LIMIT 10
2
WITH COLLECT(other_id) AS similar_user_set



The last part of the query starts with the most similar users (fellow_usersimilar_user_set from the second part) and searches for movies watched by these users which have the highest average score. The RETURN clause then returns the titles and average scores ordered by descending order.

Java
 




xxxxxxxxxx
1


 
1
MATCH (some_movie:Movie)-[fellow_rate:Rating]-(fellow_user:User)
2
  WHERE fellow_user.id IN similar_user_set
3
WITH some_movie, AVG(fellow_rate.score) AS avg_score
4
RETURN some_movie.title AS title, avg_score ORDER BY avg_score DESC;



As we can see below, the top two movies we should recommend our target user, Alice, is Space Jam, and Mr. Smith Goes to Washington.

Java
 




xxxxxxxxxx
1
15


 
1
+--------------------------------+-----------+
2
| title                          | avg_score |
3
+--------------------------------+-----------+
4
| "Space Jam"                    | 5         |
5
| "Mr. Smith Goes to Washington" | 5         |
6
| "The 39 Steps"                 | 4.71429   |
7
| "Mission: Impossible"          | 4.5       |
8
| "Dead Man"                     | 4.5       |
9
| "Sleepless in Seattle"         | 4.25      |
10
| "Nell"                         | 4.16667   |
11
| "Don Juan DeMarco"             | 4         |
12
| "Romeo Is Bleeding"            | 4         |
13
| "A Time to Kill"               | 4         |
14
| "The Godfather"                | 4         |
15
| ...                            | ...       |



Conclusion

Congratulations! You just built a simple recommendation engine by leveraging Memgraph and Cypher. Along the way, you learned how to use the basic variant of a collaborative filtering algorithm to deliver user-based recommendations. If you are interested in more tutorials like this one, you could check out our tutorial on how to build a route planning application using Breadth-First Search and Dijkstra’s algorithm.

Topics:
collaborative filtering, graph algorithms, graph database, memgraph, recommendation engine, recommendation system

Published at DZone with permission of Marko Budiselic . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}