Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Playing With Graphs and Logic Systems

DZone's Guide to

Playing With Graphs and Logic Systems

Let's take a look at how one person builds a graph query engine to see how it works just for fun.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

Recently, I have been playing with graphs a bit, trying to understand them in more depth. Because I learn much better by doing, I thought that I would build a toy graph query engine to see how that works. I loaded the MovieLens small data set into a set of C# classes and started playing with them.

Here is what the source data looks like:

public class Movie
{
    public int MovieId;
    public string Title;
    public string[] Genres;
}

public class UserRating
{
    public int MovieId;
    public float Rating;
    public DateTime Timestamp;
    public List<string> Tags = new List<string>();
}

public class User
{
    public int UserId;
    public Dictionary<int, UserRating> Ratings = new Dictionary<int, UserRating>();
}


var users = new Dictionary<int, User>();
var movies = new Dictionary<int, Movie>();

I’m not dealing with typical issues, such as how to fetch the data, optimizing indexes, etc. Instead, I want to focus solely on the problem of finding patterns in the graph.

Here is a simple example of a pattern in the graph:

(userA:User)-[:Rated]->(movie:Movie)<-[:Rated]-(userB:User)

The syntax is called Cypher, which is commonly used for graph queries.

What we are trying to find here is a set of triads. User A who rated a movie that was also rated by user B. The result of this query is a list of tuples matching (userA, movie, userB).

This is really similar to the way I remember learning Prolog, so I thought about giving it a shot and solving the problem in this way.

The first thing to do is to break the query itself into independent steps:

(userA:User)-[:Rated]->(movie:Movie) AND (userB:User)-[:Rated]->(movie:Movie)

Note that in this case, the first and second queries are exactly the same, but now they are somewhat easier to reason about. We just need to do the match ups property, here is how I would write the code:

var user_and_movie = (
   from user in users.Values
   from rating in user.Ratings.Values
   select new
   {
       user,
       movie = movies[rating.MovieId]
   }
 ).ToList();

 var one = user_and_movie;
 var two = user_and_movie;

 var results = (
    from userA in one
    join userB in two
      on userA.movie equals userB.movie
    where userA.user != userB.user
    select new
    {
        userA = userA.user.UserId,
        userB = userB.user.UserId,
        movie = userA.movie.MovieId
    }
 ).ToList();

This query can take a while to run because on the small dataset (with just 100,004 recommendations and 671 users), there are over 6.2 million such connections. And yes, I used join intentionally, because it showcases the interesting problem of cartesian product.

Now, these queries aren’t really interesting and they can be quite expensive. A better query would be to find the set of movies that were rated by both user 1 and user 306. This can be done as simple as changing the previous code starting location:

var one = (
    from user in new[] { users[1] }
    from rating in user.Ratings.Values
    select new
    {
        user,
        movie = movies[rating.MovieId]
    }
).ToList();

var two = (
    from user in new[] { users[306] }
    from rating in user.Ratings.Values
    select new
    {
        user,
        movie = movies[rating.MovieId]
    }
).ToList();

Again, this is a pretty simple scenario. A more complex one would be to find a list of movies a particular user has not rated that were rated by people who liked the same movies as this user. As a query, this will look roughly like this:

(userA:User)-[:Rated(Rating >= 4)]->(:Movie)<-[:Rated(Rating >= 4)]-(userB:User) AND (userB:User)-[:Rated(Rating >= 4)]->(notRatedByA:Movie) AND NOT (userA:User)-[:Rated]->(notRatedByA:Movie)

Note that this merely specifies the first part and finds me users that liked the same movies as userA. The second part is a bit more complex, we want to find movies rated by the second users and exclude movies rated by the first. Let’s break it into its component parts, shall we?

Here is the code for the first clause: (userA:User)-[:Rated(Rating >= 4)]->(:Movie)<-[:Rated(Rating >= 4)]-(userB:User)

var one = (
    from user in new[] { users[1] }
    from rating in user.Ratings.Values
    where rating.Rating >= 4
    select new
    {
        user,
        movie = movies[rating.MovieId]
    }
).ToList();

var two = (
    from user in users.Values   
    from rating in user.Ratings.Values
    where rating.Rating >= 4
    select new
    {
        user,
        movie = movies[rating.MovieId]
    }
).ToList();


var clauseA = (
                from userA in one
                join userB in two
                on userA.movie equals userB.movie
                where userA.user != userB.user
                select new
                {
                    userA = userA.user,
                    userB = userB.user
                }
            ).ToList();

As you can see, the output of this code is a set of (userA, userB). Now, let’s go to the second one, shall we? We already have a match on userB in this case, so we can start evaluating that. Here is the next stage: (userB:User)-[:Rated(Rating >= 4)]->(notRatedByA:Movie)

var clauseB = from matches in clauseA
              from rating in matches.userB.Ratings.Values
              where rating.Rating >= 4
              select new
              {
                  matches.userA,
                  matches.userB,
                  notRatedByA = movies[rating.MovieId]
              };

Now we have the last stage, where we need to filter things out:

var clauseC = from matches in clauseB
              where matches.userA.Ratings.ContainsKey(matches.notRatedByA.MovieId)
              select matches;

And now we have the final results.

For me, thinking about these kinds of queries as a “fill in the blanks” makes the most sense.

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

Topics:
database ,graph query ,c# ,query engine ,cypher ,graphs ,cartesian ,join

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}