{{announcement.body}}
{{announcement.title}}

Make Small Talk With Your Boss (With the Help of Graph-Based Recommendations)

DZone 's Guide to

Make Small Talk With Your Boss (With the Help of Graph-Based Recommendations)

In this article, we discuss how to create graph-based recommendations on a large music and artist dataset

· Database Zone ·
Free Resource

We’re all used to the idea of digital content recommendations by now.

Netflix recommends films and shows we might want to watch based on our preferences and past viewing history. Spotify recommends songs and artists to us that we might want to listen to based on what we’ve played recently and/or most extensively. And Amazon recommends Kindle books we might want to read based on the authors and topics we’ve shown an interest in.

I like to think, though, that not only are graphs everywhere… but  recommendations are everywhere, too. So I started thinking, “Where else could I apply recommendations in my own life?”

I’d been dying to load and explore a large music graph for a while now, so an idea started to form. I decided to try and figure out what the overlap might be between my musical tastes and those of Neo4j CEO  Emil Eifrem. Do we like any music in common? Is the music he likes at all like the music I like? If I got stuck in a lift with Emil and had to make small talk, should I mention any of my favourite bands or recommend them to him?

First I had to find out what kind of music Emil likes, and this is where things get a little stalkerish. For the “stuck in a lift making small talk” thing to work, it has to seem completely natural and off-the-cuff, so I couldn’t just ask Emil what he listens to – I had to seek that information out elsewhere.

So I did a bit of sleuthing on social media, telling myself, “This is totally uncreepy and anyway he’ll never read this…  right?

You may also like: Your Guide to GraphQL [Tutorials and Articles].

Here’s what I found.

Emil’s public profile on Spotify lists a few playlists, which I used as my starting point. There were a few soundtracks that probably wouldn’t be too useful for my recommendations – “Moulin Rouge” and “Disney Hits” (most likely for car journeys with his daughters, though in our elevator scenario I could take a gamble and start humming “Let It Go”). The artist playlists gave me a better starting point – Robbie Williams, Robyn, Amanda Fondell and Kate Bush.

Emil’s public Facebook profile only shows one “Like” for a music artist – Loney Dear. The accounts he follows on Twitter didn’t turn up much else either; lots of tech-oriented folks, of course, but very few obvious musical artists. I only spotted James Blunt and Loney Dear (maybe he  really likes Loney Dear).

So that’s my starting list of artists: Robbie Williams, Robyn, Amanda Fondell, Kate Bush, James Blunt and Loney Dear. Let’s see what I can do with it!

First, I loaded some of the data from the  MusicBrainz open music encyclopedia into a graph. There were a few challenges here: it’s a fairly large datasets, stored in a normalized relational database, and it contains a lot of data which probably isn’t too relevant to my recommendations engine. In the end, I decided to just load a few key entities that would allow me to see the relationships between artists, their recordings and “tag” metadata.

Here's the graph model: 

Cypher


x
1
call db.schema()


Artists and  Tags are pretty straight forward, but the others probably need a brief explanation.

    • Release represents a unique release (“Issuing”) of a product – specific to the date, country, label, barcode, medium, packaging, etc. This could be a vinyl album, a CD single, etc.
    • Recording is a representation of some distinct audio that was used on at least one Released track. Tracks aren’t represented in our data model, to keep the database size down, and because they weren’t as useful for helping us get to Tags, but they could be used to link Recordings to Releases.
    • Work is an intellectual or artistic creation, which can be performed by artists in one or more recordings. A cover song would have one Work, but many Recordings.

Once I had the database set up, I started by tagging the artists that I found that I think Emil likes:

Cypher




xxxxxxxxxx
1


 
1
MATCH (a:Artist) 
2
WHERE a.name in ['Kate Bush', 'Loney Dear', 'Robbie Williams', 'Robyn', 'Amanda Fondell', 'James Blunt'] 
3
SET a:EmilFave
4
RETURN a



Then I found some of my favourite artists and tagged them, too.

I have pretty eclectic musical tastes, so I tried to pick a range of styles and decades to give myself the best chance of finding interesting connections with the list of artists tagged for Emil. I also had to be quite specific, as some of the artists’ names are duplicated in the database (James Brown is a super-common artist name, who knew?).

Cypher




xxxxxxxxxx
1


 
1
MATCH (a:Artist) 
2
WHERE a.name in ['Pink Floyd', 'Daft Punk', 'LCD Soundsystem', 'The Flaming Lips', 'ESG', 'Parquet Courts', 'The Velvet Underground', 'Nina Simone', 'Hercules and Love Affair'] 
3
OR (a.name = 'James Brown' and a.comment = 'The Godfather of Soul')
4
OR (a.name = 'Air' and a.comment = 'French band')
5
OR (a.name = 'Interpol' and a.comment = 'NYC post-punk band')
6
SET a:JoeFave
7
RETURN a



To start my search for matches, I listened to a few tracks by  Robyn on Spotify. She’s an artist I’ve heard of, and have probably heard before, but I’m not familiar with her body of work. The first thing I thought was, “Hey, this sounds similar to one of my favourite bands,  Hercules & Love Affair.”

So I wrote a Cypher query for my graph to see whether these artists are somehow connected, and what the
graph might look like between them.

Cypher




xxxxxxxxxx
1


 
1
MATCH (startNode:EmilFave {name: 'Robyn'}), (endNode:JoeFave {name: 'Hercules and Love Affair'}),
2
path = allShortestPaths( (startNode)-[*]-(endNode) )
3
RETURN path



As it turns out, they both have a number of  Releases that have the same tags!

Both Robyn and Hercules & Love Affair are associated with the tags  ElectronicSynth-popPopHouseElectroEuropop and  Dance. I would certainly say that I like many of these musical genres, and if Emil does too, then I’m sure we’d have lots to talk about!

Next, I thought I’d see if any of my favourite artists performed the same songs (or “Works”) as those from Emil’s list:

Cypher




xxxxxxxxxx
1


 
1
MATCH path = (:EmilFave)-[*]->(:Work)<-[*]-(:JoeFave) RETURN path



From the query results, I can see that  Robbie Williams has performed two of the same works in the database as  Nina Simone, though if I’m sure if you asked them they’d each insist they did it "My Way."

Robbie has also performed one of the same works as  The Flaming Lips, which is a link I would never have guessed! It would be interesting to compare the Flaming Lips  version of "Sgt. Pepper’s Lonely Hearts Club Band" with the version by Robbie Williams (which seems like it was part of the Take That “Progress Live” Tour setlist). I’m sure they are very different performances, indeed. I know which I’d prefer – maybe someday I’ll get to find out which one would be Emil’s favorite.

One of the artists that Emil follows on social media,  Loney Dear, was entirely new to me, so I found some tracks on Spotify and gave a listen. I  really liked what I heard!

I wanted to see if Loney Dear was in any way connected to any of the artists I tagged as my favourites:

Cypher




xxxxxxxxxx
1


 
1
MATCH (a:Artist {name: 'Loney Dear'}), (jf:JoeFave)
2
MATCH p = shortestPath( (a)-[:HAS_RELEASE|HAS_RECORDING|HAS_WORK*]-(jf) )
3
RETURN p



It turns out that Loney Dear is connected to 11 out of my 12 favourite artists in the graph – all but  LCD Soundsystem.

Interestingly, all of his connections to my favourite artists are made via two artists who played on his recordings Lilies (Claire Bryant) and Hull (David Lindvall). I call this one "The Six Degrees of Loney Dear.”

Finally, I wanted to see which of the artists I think Emil likes are most similar to the artists I’ve tagged as my favourites.

I created a new type of direct relationship between an  Artist and a  Tag, where that artist is somehow also connected to that tag by a series of  Recordings,  Releases or  Works. It’s sort of a shortcut, to make these relationships easier and quicker to work with.

Cypher




xxxxxxxxxx
1


 
1
MATCH (a)-[:HAS_TAG|HAS_RECORDING|HAS_RELEASE|HAS_RELEASE_GROUP|HAS_WORK*]->(t:Tag)
2
WHERE a:EmilFave or a:JoeFave
3
MERGE (a)-[:LINKED_TO_TAG]->(t)




This created 349 new relationships — not a lot, and fewer than I was expecting. However, it was important to be specific about the relationship types and directions to follow in my query. Being too generic, or using open-ended relationship matches, could have linked  Artists to  Tags they aren’t really associated with.

For example, this query, follows the  HAS_WORK relationship in both directions:

Cypher




xxxxxxxxxx
1


 
1
MATCH path = (a)-[:HAS_RECORDING]->(:Recording)-[:HAS_WORK]->(:Work)<-[:HAS_WORK]-(:Recording)-[:HAS_TAG]->(t:Tag)
2
WHERE a:EmilFave or a:JoeFave
3
MERGE (a)-[:LINKED_TO_TAG]->(t)




This query creates 1,004 relationships, but links Nina Simone to the Heavy Metal tag because she  covered the same  song as Marilyn Manson! Data. Quality. Fail.

I then wrote a query to compare the tags for both Emil’s and my list of artists, to see what the overlap was:

Cypher




xxxxxxxxxx
1


 
1
MATCH (startNode:EmilFave), (endNode:JoeFave)
2
MATCH (startNode)-[:LINKED_TO_TAG]->(t:Tag)<-[:LINKED_TO_TAG]-(endNode)
3
WITH startNode, endNode, count(distinct(t)) AS Shared_Tag_Count
4
RETURN startNode.name as Emil_Artist, endNode.name as Joe_Artist, Shared_Tag_Count 
5
ORDER BY Shared_Tag_Count DESC LIMIT 10



Not a bad list, necessarily, but not as good as I was hoping for either.

Is  Kate Bush really so similar to  Pink Floyd and  James Brown? Where are  ESG, LCD Soundsystem, Hercules & Love Affair and the rest of my favourite artists?

Looking at the database, I can unfortunately see some gaps. There are some artists that aren’t linked with  Tags in the way I was expecting, which has skewed my results. Some more data import and clean up work will be required to turn this huge, complex, crowd-sourced dataset into a complete graph-driven music recommendations engine!

Still, I think I have enough data now to make sure that if I’m ever stuck in a lift with Emil and need topics for conversation, I can make plenty of small talk about music! That’s one more recommendations use case chalked up to the power of the graph.

Now, what’s next?

My boss Emil is a super cool guy, who clearly has awesome taste in music and totally won’t mind that I stalked him on social media. I hope. This has been a fun exercise in personalising recommendations, though.

It’s an example of some of the types of data analysis you might use when doing collaborative filtering – where you make recommendations to one person based on a comparison of their data to the behaviour, preferences, purchases, etc. from the data of other users. With more data loaded into the graph I could have done some really fun stuff – used some graph algorithms, looked at events and really fine-tuned the artist recommendation engine.

I found some good new music while writing this blog. Every time I listen to Loney Dear, I’ll remember that it was Emil who recommended this artist to me during an imaginary conversation in a lift we never rode together!


Further Reading

Topics:
cypher, database, graph, graphql, recommender, tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}