Follow Recommendations in Social Networks
Follow Recommendations in Social Networks
Successful follow recommendations lead to a dynamic, connected network. Learn how to use AI and machine learning techniques to implement them!
Join the DZone community and get the full member experience.Join For Free
Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.
Social media is a series of networks connecting individuals, companies, organizations, and groups to one another. These networks can transcend local, national, and international borders connecting people to networks far and wide. With all those connections, how can a user find the ones that they want to connect with?
That's where follow suggestions come in.
These days, many successful apps on social media offer some kind of follow suggestion functionality. This provides a way to recommend people that you're connected to in real life (or that you may be interested in) to follow based on your career, interests, or other connections. This brings up two questions: how do you implement this and why would you want to have follow recommendation functionality in your application?
Let's start with "why" first.
Successful Follow Recommendations Lead to a Dynamic, Connected Network
The more connected your network is, the more engaging it becomes. With more common connections, users are more likely to engage with that network and share its content with others. The more people, places, topics, etc. users follow, the more likely they are to revisit your app for interesting content. Not only does a highly connected network become more engaging for the end user but it also provides better insights for the development team into each user's engagement. It also creates clusters of users which can be used for targeted marketing techniques.
Implementing Follow Recommendations in Your App
Follow suggestions can be framed as a link prediction problem in social networks. Link prediction is modeling the evolution of a network by using features intrinsic to the network topology at time t to predict what new connections there will be at time t. By asking the question, "Who is most likely to follow whom?", one can optimize follow suggestions and help grow the network as quickly as possible.
Let's take a look at four different approaches to link prediction.
Perhaps the most intuitive or most similar to a real-life scenario is neighborhood-based methods, which rely on friends in common. In its simplest form, it can be looked at as, Alice is friends with Bob' Bob is friends with Frank; maybe Alice should be friends with Frank.
Some common methods to compute the likelihood of new links based on this idea include the Jaccard Coefficient and the Adamic-Adar Index:
The Jaccard Coefficient
The Jaccard Coefficient calculates the probability that there is a neighbor that both u and v have for a randomly selected neighbor that either u or v have. For the above graph, the Jaccard Coefficient between Bob and David would be calculated as follows:
This states that for a randomly chosen friend of Bob, there is a two-thirds chance that David also has that friend.
This has a nice feature that normalizes nodes that have many neighbors. The Jaccard Coefficient for the rest of the missing edges comes out to be:
The Adamic-Adar Index
The Adamic-Adar predictor refines the simple counting of common features between u and v by weighing less-used features more heavily. Friends without many connections but who bridge a gap between you and someone else have more influence than someone who is friends with everyone. For the above graph, the Adamic-Adar between Bob and David is calculated as follows/
Find friends in common:
Sum 1/log of the degree of those friends in common:
The rest of the missing edges comes out to be:
Perhaps the most well-known path-based method is the PageRank Algorithm, originally used by Google to rank webpages in order of importance. With the path-based method, a "random surfer" is modeled with limited attention and will follow links from one page to another, and then randomly close and restart on a random page. The PageRank is calculated by the proportion of the number of times that the surfer went to a given page. For social networks, this algorithm can be personalized for an individual node by changing the random reset location to itself. For example, to calculate the personalized PageRank for Alice, we would set the probability that the surfer resets all nodes (except Alice) equal to 0, then set the probability that the surfer resets Alice's node equal to 1. Running this on the above graph gives:
A nice feature the path-based methods give is the ability to weigh edges. A simple example is to weigh each edge by how recently it was created, allowing newer friendships or follows to have more weight and higher ranks than older ones. There are, of course, other algorithms that may better suit the needs of your applications such as HITS and SALSA, which calculate two metrics for each node (Hubs and Authority). These metrics may explain your network better if it's directed (follow relationship) instead of undirected (friendship)
Similar Interest-Based Methods
Stepping away from graph algorithms for a minute, another approach is similarity metrics (we wrote a blog post about matrix factorization if you'd like more information on how to calculate these). By using matrix factorization techniques, one can extract user feature vectors and calculate the similarities; cosine similarity is one of the more popular methods. The thought process behind the interest-based method is simple — if there is a user with similar interests to you, there is a good chance you'd want to follow that user. This can be particularly useful in follow-style platforms such as Twitter.
Machine Learning-Based Methods
This leads to a question of, "Which method is best for my application?", which depends on what you are trying to accomplish (though I'd be remiss if I didn't let a computer try to figure it out). If you treat the link prediction problem as a binary classification problem (1 for connected nodes and 0 for not connected) and use a mix of the features from above as input features for our model, you can then use your favorite classification model. SVM, decision trees, or XGBoost should work fairly well here.
Other Uses for Link Prediction in Networks
Another interesting use case for link prediction in networks is in e-commerce and targeted marketing, such as Amazon's "People who bought X also bought Y." By framing this as a link prediction problem, it is possible to recommend the most relevant product for that user. More specifically, this is a link prediction problem in a BiPartite network where one set of nodes are users and the other set is products.
Published at DZone with permission of Balazs Horanyi , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.