首页> 外文会议>Annual conference on Neural Information Processing Systems >Parallel Correlation Clustering on Big Graphs
【24h】

Parallel Correlation Clustering on Big Graphs

机译:大图上的并行相关聚类

获取原文

摘要

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!', two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds, and provably achieve nearly linear speedups. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3 approximation ratio. We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15× speedup.
机译:给定项目之间的相似度图,相关性聚类(CC)将相似的项目组合在一起,而相异的项目则分开。最受欢迎的CC算法之一是KwikCluster:一种将顶点邻域连续聚类并获得3逼近比的算法。不幸的是,实际上,KwikCluster需要大量的聚类轮次,这是大型图形的潜在瓶颈。我们介绍了C4和ClusterWild!',这两种用于并行相关性聚类的算法以多对数轮数运行,并且可证明实现了近乎线性的加速。 C4使用并发控制来增强并行集群过程的可序列化性,并保证3的近似比率。 ClusterWild!是一种无协调算法,为了更好地缩放而放弃了一致性;这导致3近似比的损失很小。我们通过实验证明,在聚类精度和运行时间方面,这两种算法均优于最新技术。我们证明了我们的算法可以在5秒内在32个内核上对十亿个边缘图进行聚类,同时实现15倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号