首页> 外文期刊>Data & Knowledge Engineering >Parallel community detection on large graphs with MapReduce and GraphChi
【24h】

Parallel community detection on large graphs with MapReduce and GraphChi

机译:使用MapReduce和GraphChi对大型图进行并行社区检测

获取原文
获取原文并翻译 | 示例

摘要

Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of vertices in a network. In this paper, we develop two parallel versions of the GN algorithm to support large-scale networks. First, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. Second, we propose another new algorithm, which we call Shortest Path Betweenness Vertex-Centric Algorithm (SPB-VCA), that utilizes the vertex-centric model. An approximation technique is also developed to further speed up community detection processes. We implemented SPB-MRA using Hadoop and SPB-VCA using GraphChi, and then evaluated the performance of SPB-MRA on Amazon EC2 instances and that of SPB-VCA on a single commodity PC. The evaluation results showed that the elapsed time of SPB-MRA decreased almost linearly as the number of reducers increased, SPB-VCA outperformed SPB-MRA just on a single PC by 4-6 times, and the approximation technique introduced negligible errors. (C) 2015 Elsevier B.V. All rights reserved.
机译:由于社交网络数据具有许多实际应用,因此从社交网络数据进行社区检测已引起了学术界和行业的广泛关注。 Girvan-Newman(GN)算法是用于社区检测的分割分层聚类算法,被认为是最受欢迎的算法之一。它利用边缘之间的概念将网络划分为多个社区。尽管已被广泛使用,但由于它需要计算网络中每对顶点之间的最短路径,因此在支持大规模网络方面存在局限性。在本文中,我们开发了两种并行版本的GN算法,以支持大规模网络。首先,我们提出了一种新的算法,我们称之为最短路径间隔MapReduce算法(SPB-MRA),该算法利用了MapReduce模型。其次,我们提出了另一种新算法,该算法称为“最短路径间隔顶点中心算法”(SPB-VCA),该算法利用了以顶点为中心的模型。还开发了一种近似技术来进一步加快社区检测过程。我们使用Hadoop实施了SPB-MRA,并使用GraphChi实施了SPB-VCA,然后评估了Amazon EC2实例上SPB-MRA的性能以及单台商用PC上SPB-VCA的性能。评估结果表明,随着还原器数量的增加,SPB-MRA的经过时间几乎呈线性下降,仅在单台PC上,SPB-VCA的性能就比SPB-MRA高出4-6倍,并且近似技术引入的误差可忽略不计。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号