Network Analytics: Centrality Part 1
Join the DZone community and get the full member experience.Join For Free
In network analysis, we often seek to distill the large network into a few attributes so that we can better understand its structure. A good place to start is finding which subgraphs of the network contain the most information. Thinking of nodes as the most elementary subgraph, we arrive at our first step in understanding our network: Which nodes are most important?
This is what data scientists refer to as Node Centrality and is a Local-Scale Feature.
Measuring Node Centrality
What if we simply measured a node's importance by how many nodes it is connected to? Also known as Degree Centrality, it takes a myopic approach to a node's importance. Let's say that you are friends with 40 people, but your friend Jake is only friends with 10. Degree centrality would say that you are more important in the social network than Jake, but what it failed to account for is that one of Jake's 10 friends was the President of the United States!
First, let's take a look at the proposed social network:
Are you really more important in the social network than Jake given that he knows the President of the United States? We need a measure that accounts for the importance of its neighboring nodes (i.e. the importance of your friends).
If you have a computer science background, you can probably sense recursion is near, but don't worry, eigenvectors will save the day!
We define the importance of node i as:
where A is the adjacency matrix.
All this means is that the importance of node i is the sum of the importance of its neighboring nodes. In the social network example, your importance is the sum of the importance of your friends. This recursive approach does not seem very attractive, but if we rearrange things, an important pattern emerges:
First, notice that
is the element-wise view of matrix-vector multiplication:
So it follows that
This last expression is simply an eigenvector problem* with eigenpair: (1/k, phi)
*Recall canonical form Av=xv where x is an eigenvalue and v is an eigenvector
Therefore, although the recursive definition is easier to understand, we can solve for each node's centrality score by simply solving the eigenvector problem (i.e. find phi) once.
Another important property to note is that since the adjacency matrix for an undirected graph will be real and symmetric, we can apply the Perron-Frobenius Theorem: lambda=1/k=1 and phi(i)>0 for all i=1..n. In other words: Finding the eigenvector of A corresponding to the eigenvalue 1 yields real and positive centrality scores for each node. Additionally we can scale these values to be less than or equal to 1 by normalizing.
This is a more believable result, with Jake and the President ranking much higher than you. Where are you ranked?
This result makes sense because you are 2 degrees of separation from the president: You-Jake-President. And your friends are 3 degrees of separation from the president: Friend-You-Jake-President. Therefore it makes sense that the centrality scores should be structured to reflect these degrees of separation.
In the coming weeks, we will continue to investigate more complex centrality measures, such as betweeness, pagerank, closeness, and move into spectral clustering. For now, all the code included uses the networkx python library and is posted on Github at https://github.com/Marco-Christiani/Network-Centrality if you want to experiment yourself.
Opinions expressed by DZone contributors are their own.