Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Article Recommendation With Personalized PageRank and Full-Text Search

DZone 's Guide to

Article Recommendation With Personalized PageRank and Full-Text Search

Let's explore article recommendation with personalized PageRank and full-text search.

· AI Zone ·
Free Resource

Six months ago, Tomaz Bratanic wrote a great post showing how to build an article recommendation engine using NLP techniques and the Personalized PageRank algorithm from the Graph Algorithms library.Image title

In the post, Tomaz extracts keywords for each article using the GraphAware NLP library and then runs PageRank in the context of articles based on these keywords.

I was curious whether I could create a poor man's version of Tomaz's work using the Full-Text Search functionality that was added in Neo4j 3.5, so here we are!

Tomaz explains how to import the data in his post, so we'll continue from there. The diagram below shows the graph model that we'll be working with. We have articles written by authors, and those articles can reference each other.

The first thing we need to do is create a Full-Text Search index for our Article nodes. We'll index the title and abstract properties on these nodes.

CALL db.index.fulltext.createNodeIndex('articlesAll', ['Article'], ['title', 'abstract'])

We can check on the progress of the index creation by running the following query:

CALL db.indexes()

Image title

It will have a state of POPULATING while node properties are being added to the index. This state will change to ONLINE once it's done. The following query will block until the index is online:

CALL db.index.fulltext.awaitIndex("articlesAll")

Now that we've done this, let's get on with the algorithms.

Social Network Analysis Papers

Tomaz first explores articles that contain the phrase "social networks." Let's create a parameter containing that search term:

:param searchTerm => '"social networks"'

Note that we've put the search term in quotes. We do this so that Full-Text Search will treat the term as a phrase rather than interpreting each term separately.

Now we want to call the PageRank algorithm from the point of view of articles that contain this search term. Let's first see how many articles the full text index comes back with:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN count(*)

Image title

Just under 15,000 nodes, or around 0.5 percent of all articles are returned by the query. The following query will return the top 10 articles for the search term:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN node.id, node.title, scoreLIMIT 10

Image title

Now we can feed these nodes into the PageRank algorithm as the sourceNodes config parameter. This will bias the results of the algorithm around these nodes.

The following query will find us the most influential articles about social networks:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD nodeWITH collect(node) as articlesCALL algo.pageRank.stream('Article', 'REFERENCES', {  sourceNodes: articles })YIELD nodeId, scoreWITH nodeId,score ORDER BY score DESC LIMIT 10RETURN algo.getNodeById(nodeId).title as article, score

Image title

As in Tomaz's post, Sergey Brin and Larry Page's paper describing Google shows up in first place.

Entropy to Me Is Not Entropy to You

In the next part of the post, Tomaz shows how we can write queries to find papers that would be interesting to researchers in different fields.

Recommendation of articles described by keyword "entropy" from the point of view of Jose C. Principe.

Let's setup parameters:

:param authorName => "Jose C. Principe";:param searchTerm => "entropy"

And now run the query:

MATCH (a:Article)-[:AUTHOR]->(author:Author)WHERE author.name=$authorNameWITH author, collect(a) as articlesCALL algo.pageRank.stream( 'CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm) YIELD node RETURN id(node) as id', 'MATCH (a1:Article)-[:REFERENCES]->(a2:Article)  RETURN id(a1) as source,id(a2) as target',  { sourceNodes: articles,  graph:'cypher',  params: {searchTerm: $searchTerm}})YIELD nodeId, scoreWITH author, nodeId, score WITH algo.getNodeById(nodeId) AS n, scoreWHERE not(exists((author)-[:AUTHOR]->(n)))RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authorsorder by score desc limit 10

We'll see these results:Image title

And what about if we run the same query for a different author?

:param authorName => "Hong Wang";

We'll see this results:Image title

We don't get exactly the same results as Tomaz, but we do still get a different set of results for the different authors.

Summary

So in summary, it does seem that we can get a reasonable approximation of Tomaz's post using Neo4j's Full-Text Search functionality.

If you have any other ideas of what we can do with this dataset, let us know by commenting below.

Topics:
artificial intelligence ,full-text search ,pagerank ,tutorial ,ai ,article recommendation

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}