Article Recommendation With Personalized PageRank and Full-Text Search
Let's explore article recommendation with personalized PageRank and full-text search.
Join the DZone community and get the full member experience.
Join For FreeSix months ago, Tomaz Bratanic wrote a great post showing how to build an article recommendation engine using NLP techniques and the Personalized PageRank algorithm from the Graph Algorithms library.
In the post, Tomaz extracts keywords for each article using the GraphAware NLP library and then runs PageRank in the context of articles based on these keywords.
I was curious whether I could create a poor man's version of Tomaz's work using the Full-Text Search functionality that was added in Neo4j 3.5, so here we are!
Tomaz explains how to import the data in his post, so we'll continue from there. The diagram below shows the graph model that we'll be working with. We have articles written by authors, and those articles can reference each other.

The first thing we need to do is create a Full-Text Search index for our Article
nodes. We'll index the title
and abstract
properties on these nodes.
CALL db.index.fulltext.createNodeIndex('articlesAll', ['Article'], ['title', 'abstract'])
We can check on the progress of the index creation by running the following query:
CALL db.indexes()
It will have a state of POPULATING
while node properties are being added to the index. This state will change to ONLINE
once it's done. The following query will block until the index is online:
CALL db.index.fulltext.awaitIndex("articlesAll")
Now that we've done this, let's get on with the algorithms.
Social Network Analysis Papers
Tomaz first explores articles that contain the phrase "social networks." Let's create a parameter containing that search term:
:param searchTerm => '"social networks"'
Note that we've put the search term in quotes. We do this so that Full-Text Search will treat the term as a phrase rather than interpreting each term separately.
Now we want to call the PageRank algorithm from the point of view of articles that contain this search term. Let's first see how many articles the full text index comes back with:
CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN count(*)
Just under 15,000 nodes, or around 0.5 percent of all articles are returned by the query. The following query will return the top 10 articles for the search term:
CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN node.id, node.title, scoreLIMIT 10
Now we can feed these nodes into the PageRank algorithm as the sourceNodes
config parameter. This will bias the results of the algorithm around these nodes.
The following query will find us the most influential articles about social networks:
CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD nodeWITH collect(node) as articlesCALL algo.pageRank.stream('Article', 'REFERENCES', { sourceNodes: articles })YIELD nodeId, scoreWITH nodeId,score ORDER BY score DESC LIMIT 10RETURN algo.getNodeById(nodeId).title as article, score
As in Tomaz's post, Sergey Brin and Larry Page's paper describing Google shows up in first place.
Entropy to Me Is Not Entropy to You
In the next part of the post, Tomaz shows how we can write queries to find papers that would be interesting to researchers in different fields.
Recommendation of articles described by keyword "entropy" from the point of view of Jose C. Principe.
Let's setup parameters:
:param authorName => "Jose C. Principe";:param searchTerm => "entropy"
And now run the query:
MATCH (a:Article)-[:AUTHOR]->(author:Author)WHERE author.name=$authorNameWITH author, collect(a) as articlesCALL algo.pageRank.stream( 'CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm) YIELD node RETURN id(node) as id', 'MATCH (a1:Article)-[:REFERENCES]->(a2:Article) RETURN id(a1) as source,id(a2) as target', { sourceNodes: articles, graph:'cypher', params: {searchTerm: $searchTerm}})YIELD nodeId, scoreWITH author, nodeId, score WITH algo.getNodeById(nodeId) AS n, scoreWHERE not(exists((author)-[:AUTHOR]->(n)))RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authorsorder by score desc limit 10
We'll see these results:
And what about if we run the same query for a different author?
:param authorName => "Hong Wang";
We'll see this results:
We don't get exactly the same results as Tomaz, but we do still get a different set of results for the different authors.
Summary
So in summary, it does seem that we can get a reasonable approximation of Tomaz's post using Neo4j's Full-Text Search functionality.
If you have any other ideas of what we can do with this dataset, let us know by commenting below.
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments