A fast kernel-based multilevel algorithm for graph clustering

机译：一种基于内核的快速多级图聚类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graph clustering (also called graph partitioning) --- clustering the nodes of a graph --- is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a general cut objective has been shown to be mathematically equivalent to an appropriate weighted kernel k-means objective function. In this paper, we exploit this equivalence to develop a very fast multilevel algorithm for graph clustering. Multilevel approaches involve coarsening, initial partitioning and refinement phases, all of which may be specialized to different graph clustering objectives. Unlike existing multilevel clustering approaches, such as METIS, our algorithm does not constrain the cluster sizes to be nearly equal. Our approach gives a theoretical guarantee that the refinement step decreases the graph cut objective under consideration. Experiments show that we achieve better final objective function values as compared to a state-of-the-art spectral clustering algorithm: on a series of benchmark test graphs with up to thirty thousand nodes and one million edges, our algorithm achieves lower normalized cut values in 67% of our experiments and higher ratio association values in 100% of our experiments. Furthermore, on large graphs, our algorithm is significantly faster than spectral methods. Finally, our algorithm requires far less memory than spectral methods; we cluster a 1.2 million node movie network into 5000 clusters, which due to memory requirements cannot be done directly with spectral methods.

机译：图聚类（也称为图分区）---将图的节点聚类---是各种数据挖掘应用程序中的一个重要问题。传统方法涉及图形聚类目标的优化，例如归一化割或比率关联。光谱方法被广泛用于这些目标，但是它们需要特征向量计算，而这可能会很慢。最近，具有一般切割目标的图聚类在数学上已等效于适当的加权核 k -均值目标函数。在本文中，我们利用这种等效性来开发用于图聚类的非常快速的多级算法。多级方法涉及粗化，初始分区和细化阶段，所有这些阶段都可以专门用于不同的图聚类目标。与现有的多级聚类方法（例如METIS）不同，我们的算法不会将聚类大小约束为几乎相等。我们的方法提供了理论上的保证，即细化步骤减少了所考虑的图形切割目标。实验表明，与最新的光谱聚类算法相比，我们获得了更好的最终目标函数值：在一系列多达三万个节点和一百万个边的基准测试图上，我们的算法实现了较低的归一化割值我们有67％的实验具有较高的比率关联值，而有100％的实验具有较高的比率关联值。此外，在大图上，我们的算法比光谱方法要快得多。最后，我们的算法比光谱方法需要更少的内存。我们将一个120万个节点的电影网络集群为5000个集群，由于内存需求，无法直接使用频谱方法完成该网络。 展开▼

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.629-634|共6页

会议地点

作者
Inderjit Dhillon; Yuqiang Guan; Brian Kulis; PInderjit Dhillon;
展开▼

作者单位

展开▼

会议组织

原文格式 PDF

正文语种

中图分类计算技术、计算机技术;

关键词
spectral clustering;

机译：光谱聚类;

相似文献

外文文献

中文文献

专利

1. Parallel Implementations of Multilevel Fast Multipole Algorithm on Graphical Processing Unit Cluster for Large-scale Electromagnetics Objects [J] . Nghia Tran, Kilic Ozlem Applied Computational Electromagnetics Society journal . 2018,第2期

机译：大型电磁对象图形处理单元簇上多级快速多极算法的并行实现

2. Electrocardiography Classification Based on Revised Locally Linear Embedding Algorithm and Kernel-Based Fuzzy C-Means Clustering [J] . Cong Zhang, Shan Zeng, Hui Zhang Journal of Medical Imaging and Health Informatics . 2014,第6期

机译：基于改进的局部线性嵌入算法和基于核的模糊C均值聚类的心电图分类

3. A faster algorithm for the cluster editing problem on proper interval graphs [J] . Chih Lin Min, Soulignac Francisco J., Szwarcfiter Jayme L. Information Processing Letters . 2015,第12期

机译：适当间隔图上针对聚类编辑问题的更快算法

4. A Fast Kernel-based Multilevel Algorithm for Graph　Clustering [C] . Inderjit Dhillon, Yuqiang Guan, Brian Kulis Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'05); 20050821-24; Chicago,IL(US) . 2005

机译：基于快速核的图簇聚类算法

5. G-hash: Towards fast kernel-based similarity search in large graph databases. [D] . Wang, Xiaohong. 2009

机译：G哈希：在大型图形数据库中寻求基于内核的快速相似性搜索。

6. A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits [O] . Kang K. Yan, Hongyu Zhao, Herbert Pang 2017

机译：基于图和核的组学数据集成算法对复杂特征进行分类的比较

7. A Fast Kernel-based Multilevel Algorithm for GraphClustering [O] . 2009

机译：一种基于快速核的GraphClustering多级算法

8. Multilevel Algorithms for Partitioning Power-Law Graphs [R] . Abou-rjeili, A. , Karypis, G. 2005

机译：用于分割幂律图的多级算法

1. 一种多级维纳滤波器的快速实现算法--迭代相关相减算法 [J] . 丁前军 ,王永良 ,张永顺 . 通信学报 . 2005,第012期

2. 一种基于节点间路径度量的图聚类算法 [J] . 郑文萍 ,车晨浩 ,钱宇华 . 计算机学报 . 2020,第007期

3. 一种有效的基于GraphX的分布式结构化图聚类算法 [J] . 时生乐 ,赵宇海 ,李源 . 计算机科学与探索 . 2018,第010期

4. 一种基于核心顶点的无参图聚类算法 [J] . Lin Junshan ,Wen Xiaofang ,Chen Mei . 计算机应用研究 . 2018,第012期

5. 一种基于层次图聚类的蛋白质复合体检测算法 [J] . 杨贵 . 山西师范大学学报（自然科学版） . 2017,第004期

6. 一种多级接入网生存率的快速算法 [C] . 樊鹤红 ,孙小菡 . 2009年江苏省“光科学与技术”博士生学术论坛 . 2009

7. 大规模数据快速图聚类算法研究 [A] . 尤坊州 . 2021

1. 一种基于JPEG-LS算法的多路压缩内核并行编码的控制方法 [P] . 中国专利： CN102801981B . 2015.04.22

2. 一种基于JPEG-LS算法的多路压缩内核并行编码的控制方法 [P] . 中国专利： CN102801981A . 2012-11-28

3. Execution of cryptographic algorithms based on the Chinese Remainder theorem, CRT, for use with chip cards, etc., which by changing the order of calculation runs faster than existing CRT algorithms [P] . 外国专利： FR2819663A1 . 2002-07-19

机译：用于芯片卡等的基于中国剩余定理CRT的密码算法的执行，通过更改计算顺序，该算法比现有CRT算法运行得更快

4. Fast algorithms and metrics for comparing hierarchical clustering information trees and numerical vectors [P] . 外国专利： US8095543B1 . 2012-01-10

机译：用于比较层次聚类信息树和数值向量的快速算法和度量

5. STREAM ENCRYPTION AND DECODING METHOD BASED ON A HASH TREE CAPABLE OF OFFERING THE ACCESSIBILITY OF THE BLOCK UNIT TO A BLOCK CRYPTOGRAPHIC ALGORITHM AND THE FAST OPERATION SPEED OF A STREAM ENCRYPTION ALGORITHM AND A CRYPTOGRAPHIC FILE SYSTEM [P] . 外国专利： KR20110118273A . 2011-10-31

机译：基于哈希树的流加密与解码方法

相关主题

A fast kernel-based multilevel algorithm for graph clustering

摘要

著录项

相似文献

相关主题

期刊订阅