DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. Article Recommendation With Personalized PageRank and Full-Text Search

Article Recommendation With Personalized PageRank and Full-Text Search

Let's explore article recommendation with personalized PageRank and full-text search.

Mark Needham user avatar by
Mark Needham
·
Mar. 07, 19 · Tutorial
Like (1)
Save
Tweet
Share
5.69K Views

Join the DZone community and get the full member experience.

Join For Free

Six months ago, Tomaz Bratanic wrote a great post showing how to build an article recommendation engine using NLP techniques and the Personalized PageRank algorithm from the Graph Algorithms library.Image title

In the post, Tomaz extracts keywords for each article using the GraphAware NLP library and then runs PageRank in the context of articles based on these keywords.

I was curious whether I could create a poor man's version of Tomaz's work using the Full-Text Search functionality that was added in Neo4j 3.5, so here we are!

Tomaz explains how to import the data in his post, so we'll continue from there. The diagram below shows the graph model that we'll be working with. We have articles written by authors, and those articles can reference each other.

The first thing we need to do is create a Full-Text Search index for our Article nodes. We'll index the title and abstract properties on these nodes.

CALL db.index.fulltext.createNodeIndex('articlesAll', ['Article'], ['title', 'abstract'])

We can check on the progress of the index creation by running the following query:

CALL db.indexes()

Image title

It will have a state of POPULATING while node properties are being added to the index. This state will change to ONLINE once it's done. The following query will block until the index is online:

CALL db.index.fulltext.awaitIndex("articlesAll")

Now that we've done this, let's get on with the algorithms.

Social Network Analysis Papers

Tomaz first explores articles that contain the phrase "social networks." Let's create a parameter containing that search term:

:param searchTerm => '"social networks"'

Note that we've put the search term in quotes. We do this so that Full-Text Search will treat the term as a phrase rather than interpreting each term separately.

Now we want to call the PageRank algorithm from the point of view of articles that contain this search term. Let's first see how many articles the full text index comes back with:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN count(*)

Image title

Just under 15,000 nodes, or around 0.5 percent of all articles are returned by the query. The following query will return the top 10 articles for the search term:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD node, scoreRETURN node.id, node.title, scoreLIMIT 10

Image title

Now we can feed these nodes into the PageRank algorithm as the sourceNodes config parameter. This will bias the results of the algorithm around these nodes.

The following query will find us the most influential articles about social networks:

CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm)YIELD nodeWITH collect(node) as articlesCALL algo.pageRank.stream('Article', 'REFERENCES', {  sourceNodes: articles })YIELD nodeId, scoreWITH nodeId,score ORDER BY score DESC LIMIT 10RETURN algo.getNodeById(nodeId).title as article, score

Image title

As in Tomaz's post, Sergey Brin and Larry Page's paper describing Google shows up in first place.

Entropy to Me Is Not Entropy to You

In the next part of the post, Tomaz shows how we can write queries to find papers that would be interesting to researchers in different fields.

Recommendation of articles described by keyword "entropy" from the point of view of Jose C. Principe.

Let's setup parameters:

:param authorName => "Jose C. Principe";:param searchTerm => "entropy"

And now run the query:

MATCH (a:Article)-[:AUTHOR]->(author:Author)WHERE author.name=$authorNameWITH author, collect(a) as articlesCALL algo.pageRank.stream( 'CALL db.index.fulltext.queryNodes("articlesAll", $searchTerm) YIELD node RETURN id(node) as id', 'MATCH (a1:Article)-[:REFERENCES]->(a2:Article)  RETURN id(a1) as source,id(a2) as target',  { sourceNodes: articles,  graph:'cypher',  params: {searchTerm: $searchTerm}})YIELD nodeId, scoreWITH author, nodeId, score WITH algo.getNodeById(nodeId) AS n, scoreWHERE not(exists((author)-[:AUTHOR]->(n)))RETURN n.title as article, score, [(n)-[:AUTHOR]->(author) | author.name][..5] AS authorsorder by score desc limit 10

We'll see these results:Image title

And what about if we run the same query for a different author?

:param authorName => "Hong Wang";

We'll see this results:Image title

We don't get exactly the same results as Tomaz, but we do still get a different set of results for the different authors.

Summary

So in summary, it does seem that we can get a reasonable approximation of Tomaz's post using Neo4j's Full-Text Search functionality.

If you have any other ideas of what we can do with this dataset, let us know by commenting below.

Database PageRank

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Steel Threads Are a Technique That Will Make You a Better Engineer
  • Microservices Testing
  • Tracking Software Architecture Decisions
  • Building Microservice in Golang

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: