Community search is a very important aspect of Social Network (SN) analysis. Communities are defined as tightly coupled groups of people, who, for instance, communicate or interact with the members of the community more intensely than with the rest of the population. However, SNs are complex representations of society and understanding their structure is necessary to be able to find those communities accurately.
Let us give a few examples of how important is community detection in different environments. In SNs like Facebook or LinkedIn it is important to detect communities of users who are interested in a topic and also are tightly linked among them, unveiling target groups for specific marketing campaigns. In Telco environments, it is important to understand potential churn communities. In police investigation, it is important to detect groups of criminals who act in a coordinated manner. In networks from biology, computer science, engineering, economics and politics, communities are also important in different ways.
The Scalable Community Detection (SCD) algorithm, presented recently at the WWW’14 conference in Korea last March 2014, shows how understanding the nature of social communities may be important to be more accurate and fast finding such communities in very large graphs.
SCD exploits a basic property of SNs, i.e. they have a number of transitive relations (triangles) larger than other types of networks. As an extension, the number of triangles in a specific community will be larger than in the whole SN. The basic idea of SCD is to search for those triangles and understand how they structure to form cohesive and structured communities around those triangles.
In the paper, the previous fastest and most accurate algorithms are compared to our algorithm showing that SCD is faster than the fastest (the Louvain algorithm) and more accurate than those providing the highest quality (the Oslom algorithm).
Based on the Friendster (a sample of 65M nodes and more than 1.8B edges) and Orkut (3M nodes and more than 117M edges) datasets from SNAP, we show the most interesting results of the comparison between the three algorithms. Oslom is not able to execute on Friendster, and takes almost 5 days to compute the communities of Orkut. The quality obtained by Oslom on Orkut is slightly worse than that obtained by SCD. The plots show that while SCD is almost 7 times faster than Louvain for Friendster, it provides 74 times more quality. Note that in terms of performance SCD takes 3 hours and 30 minutes while Louvain takes a full day to analyze the same dataset. With Orkut, a significantly smaller dataset, SCD is 30% faster and obtains 20% more quality than Louvain (not in plots).
In addition, SCD provides a highly parallel scalable solution with close to linear speedup, showing better speedups for larger datasets as shown in the plot below, where SCD is tested for Orkut and Friendster on one, two and four cores. This very good behavior is caused by the fact that SCD is based on triangle search which is naturally parallel, dominating most of the computation.
This research has been possible through the collaboration of Sparsity Technologies with DAMA-UPC, showing their compromise with high quality research and technology. You can see more of our joint research in www.sparsity-technologies.com/blog.