Over a million developers have joined DZone.

Game of Friendship Paradox

DZone's Guide to

Game of Friendship Paradox

The paradox is that your friends probably have more friends than you. We take a closer look at this head-scratcher and create some data visualization using code.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might be abusive here, but let’s continue to call connected nodes “friends.” The friendship paradox states that:

People, on average, have fewer friends than their friends.

This was discussed in Feld (1991) for instance, or Zuckerman & Jost (2001). Let’s try to see what it means here. First, let us get a copy of the dataset:


Because it is difficult for me to incorporate some d3.js scripts in the post, I will illustrate this with a more basic graph: 

Consider a vertex V in the undirected graph G=(V,E(with classical graph notations), and let d(v) denote the number of edges touching it (i.e., v has d(v) friends). The average number of friends of a random person in the graph is:

Image title

The average number of friends that a typical friend has is:

Image title


Image title


Image title

Note that this can be related to the variance decomposition:

Image title


Image title

(Jensen inequality). But let us get back to our network. The list of nodes is:


And we each of them, we can get the list of friends, and the number of friends:

friends = function(x) as.character(M[which(M[,1]==x),2])
nb_friends = Vectorize(function(x) length(friends(x)))

As well as the number of friends our friends have, and the average number of friends.

friends_of_friends = function(y) (Vectorize(function(x) length(friends(x)))(friends(y)))
nb_friends_of_friends = Vectorize(function(x) mean(friends_of_friends(x)))

We can look at the density of the number of friends, for a random node.

Nb  = nb_friends(nodes)
Nb2 = nb_friends_of_friends(nodes)
hist(Nb,breaks=0:40,col=rgb(1,0,0,.2),border="white",probability = TRUE)
hist(Nb2,breaks=0:40,col=rgb(0,0,1,.2),border="white",probability = TRUE,add=TRUE)

And we can also compute the averages, just to check:

[1] 6.579439
[1] 13.94243

So, indeed, people on average have fewer friends than their friends.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,friendship paradox ,data visualization ,statistical analysis

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}