Machine Learning: Page Rank Algorithm
Join the DZone community and get the full member experience.Join For Free
Has google solved all the problems of searching, do we ever need to improve/understand/implement a search based algorithm ?
Recently while implementing a module in the product I am working on, I had to implement searching for the overall system. So I had to study different searching techniques. Lets talk about one of the core concepts of web searching called Page Rank Algorithm:
Two most important things in any web searching system are:
- Words in the search string
- Words present in the web pages of the system.
For e.g. while searching for "mountain biking" bike and bicycle need to be searched in the context of mountain. But note that Motorcycle may not be relevant in this context. So getting synonyms of the words in the search is part of information retrieval. But the heart of the system would be getting context and page rank helps us build that context.
Core Idea behind Page Rank algorithm is using the link structure of the web. The point to note here is web based systems are web graphs forming a directed network. In such a network web pages are vertices and links between them are the edges.(Which has both in-degree and out-degree)
Now in this complex system how do we decided which web pages are important pages, page ranks says that a web page is important if
- It has more number of in-degree pages i.e more number of pages are referring it
- The pages referring to this page also have high in-degree.
- This way the system will be a machine learning based system changing ranks of pages based on the in-degree which decides the quality of searches.
Here is the formal algorithm,
- suppose p and q are two web pages in your system q -> p
- Let R(q) be rank of q, let out(q) be out degree of q.
- q distributes its rank over its out link
- So rank of p is R(p) = sum( R(q) / out(q) ) (of all points for p)
- turn this equation into an update rule for every web page in the system
- This algorithm will converge and reach a point where ranks for all pages are computed once the circularity of all the pages is done.
- This is not as static as it appears because web pages get added and deleted causing page ranks to change all the time.
Important point to note here is Page rank defines rank of page based on the global system and is not based on locality. For e.g a web page may just have one link but overall in the system there might be pages with a very high page rank pointing to this page increasing its overall rank in the system.
Fundamentally this is how you're getting your search results in the web. Off course Google has many sophisticated algorithms on top of this, but page rank gives a rough idea of the overall system. This same concept is useful in understanding how social networking works, why a particular video goes viral. So next time you want to make a buzz in the web world you know which webpages to link to.
Opinions expressed by DZone contributors are their own.