首页> 外文期刊>Journal of Parallel and Distributed Computing >Multi-threaded modularity based graph clustering using the multilevel paradigm
【24h】

Multi-threaded modularity based graph clustering using the multilevel paradigm

机译:使用多级范例的基于多线程模块化的图聚类

获取原文
获取原文并翻译 | 示例

摘要

Graphs are an important tool for modeling data in many diverse domains. Recent increase in sensor technology and deployment, the adoption of online services, and the scale of VLSI circuits has caused the size of these graphs to skyrocket. Finding clusters of highly connected vertices within these graphs is a critical part of their analysis. In this paper we apply the multilevel paradigm to the modularity graph clustering problem. We improve upon the state of the art by introducing new efficient methods for coarsening graphs, creating initial clusterings, and performing local refinement on the resulting clusterings. We establish that for a graph with n vertices and m edges, these algorithms have an O(m+n) runtime complexity and an O(m+n) space complexity, and show that in practice they are extremely fast. We present shared-memory parallel formulations of these algorithms to take full advantage of modern architectures, which we show have a parallel runtime of O(m/p+n/p+k), where p is the number of threads and k is the number of clusters. Finally, we present the product of this research, the clustering tool Nerstrand. In serial mode, Nerstrand runs in a fraction of the time of current methods and produces results of equal quality. When run in parallel mode, Nerstrand exhibits significant speedup with less than one percent degradation of clustering quality. Nerstrand works well on large graphs, clustering a graph with over 105 million vertices and 3.3 billion edges in 90 s.
机译:图是在许多不同领域中建模数据的重要工具。传感器技术和部署的最新发展,在线服务的采用以及VLSI电路的规模使这些图形的尺寸猛增。在这些图中查找高度连接的顶点簇是其分析的关键部分。在本文中,我们将多级范例应用于模块化图聚类问题。我们通过引入新的有效方法来粗化图,创建初始聚类以及对所得聚类执行局部优化,从而改进了现有技术。我们建立了一个具有n个顶点和m个边的图,这些算法具有O(m + n)运行时复杂度和O(m + n)空间复杂度,并表明在实践中它们非常快。我们提出了这些算法的共享内存并行表述,以充分利用现代体系结构,我们展示了它们的并行运行时为O(m / p + n / p + k),其中p是线程数,k是线程数。集群数。最后,我们介绍了这项研究的产品,即聚类工具Nerstrand。在串行模式下,Nerstrand的运行时间仅为当前方法的一小部分,并且产生的结果质量相同。当以并行模式运行时,Nerstrand表现出显着的加速,并且集群质量降低了不到1%。 Nerstrand在大型图上效果很好,可以在90 s内将具有超过1.05亿个顶点和33亿条边的图聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号