Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

# Playing With Graphs and Logic Systems

DZone 's Guide to

# Playing With Graphs and Logic Systems

### Let's take a look at how one person builds a graph query engine to see how it works just for fun.

· Database Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Recently, I have been playing with graphs a bit, trying to understand them in more depth. Because I learn much better by doing, I thought that I would build a toy graph query engine to see how that works. I loaded the MovieLens small data set into a set of C# classes and started playing with them.

Here is what the source data looks like:

``````public class Movie
{
public int MovieId;
public string Title;
public string[] Genres;
}

public class UserRating
{
public int MovieId;
public float Rating;
public DateTime Timestamp;
public List<string> Tags = new List<string>();
}

public class User
{
public int UserId;
public Dictionary<int, UserRating> Ratings = new Dictionary<int, UserRating>();
}

var users = new Dictionary<int, User>();
var movies = new Dictionary<int, Movie>();``````

I’m not dealing with typical issues, such as how to fetch the data, optimizing indexes, etc. Instead, I want to focus solely on the problem of finding patterns in the graph.

Here is a simple example of a pattern in the graph:

``(userA:User)-[:Rated]->(movie:Movie)<-[:Rated]-(userB:User)``

The syntax is called Cypher, which is commonly used for graph queries.

What we are trying to find here is a set of triads. User A who rated a movie that was also rated by user B. The result of this query is a list of tuples matching (userA, movie, userB).

This is really similar to the way I remember learning Prolog, so I thought about giving it a shot and solving the problem in this way.

The first thing to do is to break the query itself into independent steps:

``(userA:User)-[:Rated]->(movie:Movie) AND (userB:User)-[:Rated]->(movie:Movie)``

Note that in this case, the first and second queries are exactly the same, but now they are somewhat easier to reason about. We just need to do the match ups property, here is how I would write the code:

``````var user_and_movie = (
from user in users.Values
from rating in user.Ratings.Values
select new
{
user,
movie = movies[rating.MovieId]
}
).ToList();

var one = user_and_movie;
var two = user_and_movie;

var results = (
from userA in one
on userA.movie equals userB.movie
where userA.user != userB.user
select new
{
userA = userA.user.UserId,
userB = userB.user.UserId,
movie = userA.movie.MovieId
}
).ToList();``````

This query can take a while to run because on the small dataset (with just 100,004 recommendations and 671 users), there are over 6.2 million such connections. And yes, I used join intentionally, because it showcases the interesting problem of cartesian product.

Now, these queries aren’t really interesting and they can be quite expensive. A better query would be to find the set of movies that were rated by both user 1 and user 306. This can be done as simple as changing the previous code starting location:

``````var one = (
from user in new[] { users[1] }
from rating in user.Ratings.Values
select new
{
user,
movie = movies[rating.MovieId]
}
).ToList();

var two = (
from user in new[] { users[306] }
from rating in user.Ratings.Values
select new
{
user,
movie = movies[rating.MovieId]
}
).ToList();``````

Again, this is a pretty simple scenario. A more complex one would be to find a list of movies a particular user has not rated that were rated by people who liked the same movies as this user. As a query, this will look roughly like this:

``(userA:User)-[:Rated(Rating >= 4)]->(:Movie)<-[:Rated(Rating >= 4)]-(userB:User) AND (userB:User)-[:Rated(Rating >= 4)]->(notRatedByA:Movie) AND NOT (userA:User)-[:Rated]->(notRatedByA:Movie)``

Note that this merely specifies the first part and finds me users that liked the same movies as userA. The second part is a bit more complex, we want to find movies rated by the second users and exclude movies rated by the first. Let’s break it into its component parts, shall we?

Here is the code for the first clause: `(userA:User)-[:Rated(Rating >= 4)]->(:Movie)<-[:Rated(Rating >= 4)]-(userB:User)`

``````var one = (
from user in new[] { users[1] }
from rating in user.Ratings.Values
where rating.Rating >= 4
select new
{
user,
movie = movies[rating.MovieId]
}
).ToList();

var two = (
from user in users.Values
from rating in user.Ratings.Values
where rating.Rating >= 4
select new
{
user,
movie = movies[rating.MovieId]
}
).ToList();

var clauseA = (
from userA in one
on userA.movie equals userB.movie
where userA.user != userB.user
select new
{
userA = userA.user,
userB = userB.user
}
).ToList();``````

As you can see, the output of this code is a set of (userA, userB). Now, let’s go to the second one, shall we? We already have a match on userB in this case, so we can start evaluating that. Here is the next stage: `(userB:User)-[:Rated(Rating >= 4)]->(notRatedByA:Movie)`

``````var clauseB = from matches in clauseA
from rating in matches.userB.Ratings.Values
where rating.Rating >= 4
select new
{
matches.userA,
matches.userB,
notRatedByA = movies[rating.MovieId]
};``````

Now we have the last stage, where we need to filter things out:

``````var clauseC = from matches in clauseB
where matches.userA.Ratings.ContainsKey(matches.notRatedByA.MovieId)
select matches;``````

And now we have the final results.

For me, thinking about these kinds of queries as a “fill in the blanks” makes the most sense.

Topics:
database ,graph query ,c# ,query engine ,cypher ,graphs ,cartesian ,join

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.