首页> 外文会议>ACM SIGKDD international conference on Knowledge discovery in data mining >A fast kernel-based multilevel algorithm for graph clustering
【24h】

A fast kernel-based multilevel algorithm for graph clustering

机译:一种基于内核的快速多级图聚类算法

获取原文

摘要

Graph clustering (also called graph partitioning) --- clustering the nodes of a graph --- is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a general cut objective has been shown to be mathematically equivalent to an appropriate weighted kernel k-means objective function. In this paper, we exploit this equivalence to develop a very fast multilevel algorithm for graph clustering. Multilevel approaches involve coarsening, initial partitioning and refinement phases, all of which may be specialized to different graph clustering objectives. Unlike existing multilevel clustering approaches, such as METIS, our algorithm does not constrain the cluster sizes to be nearly equal. Our approach gives a theoretical guarantee that the refinement step decreases the graph cut objective under consideration. Experiments show that we achieve better final objective function values as compared to a state-of-the-art spectral clustering algorithm: on a series of benchmark test graphs with up to thirty thousand nodes and one million edges, our algorithm achieves lower normalized cut values in 67% of our experiments and higher ratio association values in 100% of our experiments. Furthermore, on large graphs, our algorithm is significantly faster than spectral methods. Finally, our algorithm requires far less memory than spectral methods; we cluster a 1.2 million node movie network into 5000 clusters, which due to memory requirements cannot be done directly with spectral methods.
机译:图聚类(也称为图分区)---将图的节点聚类---是各种数据挖掘应用程序中的一个重要问题。传统方法涉及图形聚类目标的优化,例如归一化割或比率关联。光谱方法被广泛用于这些目标,但是它们需要特征向量计算,而这可能会很慢。最近,具有一般切割目标的图聚类在数学上已等效于适当的加权核 k -均值目标函数。在本文中,我们利用这种等效性来开发用于图聚类的非常快速的多级算法。多级方法涉及粗化,初始分区和细化阶段,所有这些阶段都可以专门用于不同的图聚类目标。与现有的多级聚类方法(例如METIS)不同,我们的算法不会将聚类大小约束为几乎相等。我们的方法提供了理论上的保证,即细化步骤减少了所考虑的图形切割目标。实验表明,与最新的光谱聚类算法相比,我们获得了更好的最终目标函数值:在一系列多达三万个节点和一百万个边的基准测试图上,我们的算法实现了较低的归一化割值我们有67%的实验具有较高的比率关联值,而有100%的实验具有较高的比率关联值。此外,在大图上,我们的算法比光谱方法要快得多。最后,我们的算法比光谱方法需要更少的内存。我们将一个120万个节点的电影网络集群为5000个集群,由于内存需求,无法直接使用频谱方法完成该网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号