Parallel Correlation Clustering on Big Graphs

机译：大图上的并行相关聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!', two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds, and provably achieve nearly linear speedups. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3 approximation ratio. We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15× speedup.

机译：给定项目之间的相似度图，相关性聚类（CC）将相似的项目组合在一起，而相异的项目则分开。最受欢迎的CC算法之一是KwikCluster：一种将顶点邻域连续聚类并获得3逼近比的算法。不幸的是，实际上，KwikCluster需要大量的聚类轮次，这是大型图形的潜在瓶颈。我们介绍了C4和ClusterWild！'，这两种用于并行相关性聚类的算法以多对数轮数运行，并且可证明实现了近乎线性的加速。 C4使用并发控制来增强并行集群过程的可序列化性，并保证3的近似比率。 ClusterWild！是一种无协调算法，为了更好地缩放而放弃了一致性;这导致3近似比的损失很小。我们通过实验证明，在聚类精度和运行时间方面，这两种算法均优于最新技术。我们证明了我们的算法可以在5秒内在32个内核上对十亿个边缘图进行聚类，同时实现15倍的加速。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2015年|82-90|共9页
会议地点
作者
Xinghao Pan; Dimitris Papailiopoulos; Samet Oymak; Benjamin Recht; Kannan Ramchandran; Michael I. Jordan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters [J] . Jie Yan, Guangming Tan, Zeyao Mo, IEEE Transactions on Parallel and Distributed Systems . 2016,第6期

机译：Graphine：用于多核群集的大型自然图的编程图并行计算
2. Correlation of telegraph noise between parallel and antiparallel states of magnetic tunnel junctions [J] . P. Dhagat, A. Jander, C. A. Nordman Journal of Applied Physics . 2005,第10Pt2期

机译：电磁隧道结的平行和反平行状态之间电报噪声的相关性
3. Parallelizing maximum likelihood classification on computer cluster and graphics processing unit for supervised image classification [J] . Shi Xuan, Xue Bowei International journal of digital Earth . 2017,第7a9期

机译：对监控图像分类的计算机集群和图形处理单元的最大似然分类并行化
4. Parallel Correlation Clustering on Big Graphs [C] . Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Annual conference on Neural Information Processing Systems . 2015

机译：在大图上并行相关聚类
5. Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset. [D] . Peterson, Angela R. 2009

机译：可视数据挖掘：使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。
6. Parallelizing Affinity Propagation Using Graphics Processing Units for Spatial Cluster Analysis over Big Geospatial Data [O] . Xuan Shi -1

机译：使用图形处理单元对亲和力进行并行传播以对大地理空间数据进行空间聚类分析
7. Figure 3: (A) Constrained ordination (CAP; Resource and Fencing are the constraining factors) with vector overlays representing simple Spearman correlations between response variables and the two axes (Appendix S2). Separation of communities along Axis 1 is largely related to the impact of the Resource treatment on community structure. Thus, the extent to which a vector is parallel with Axis 1 reflects the extent of the negative (to the left) or positive (to the right) correlation of densities of that taxon with the addition of detritus. The length of each vector represents the joint correlation of the response variable with both axes of the ordination, with the circle representing a correlation of 1. Vectors shown have Spearman coefficients with CAP Axis 1 that are ≥.50 or ≤−.50. To prevent clutter on the graph, arrow heads of the vectors are not drawn. Key to abbreviations is in Fig. 1. (B) Constrained ordination (CAP) with vector overlays representing multiple correlation coefficients (analogous to univariate partial correlation coefficients). [O] . -1

机译：图3：（a）受约束的秩序（帽;资源和围栏是约束因子），其具有表示响应变量和两个轴之间的简单的Spearman相关性的载体覆盖（附录S2）。沿着轴1的群落分离在很大程度上与资源处理对群落结构的影响有关。因此，载体与轴1平行的程度反映了将该分类群的密度的负（向左）或正（右侧）相关的程度反映了碎屑。每个矢量的长度表示与序列的两个轴的响应变量的关节相关性，表示所示的圆圈的圆圈具有凸轴1的矛盾系数，≥50或≤-.50。为了防止图形上的杂乱，没有绘制向量的箭头头。缩写的键在图1中。（b）受约束的偏移（帽），其覆盖层表示多个相关系数（类似于单偏移部分相关系数）。

Parallel Correlation Clustering on Big Graphs

摘要

著录项

相似文献

相关主题

期刊订阅