首页> 外文会议>Electrical and Computer Engineering, 2005. Canadian Conference on >A fast clustering algorithm based on grid and density
【24h】

A fast clustering algorithm based on grid and density

机译:基于网格和密度的快速聚类算法

获取原文

摘要

The efficiency of data mining algorithms is a very important issue as data becoming larger and larger. Density-based clustering analysis can discover clusters with arbitrary shape and is insensitive to noise data. The advantage of grid-based clustering method is linear time complexity. In this paper, we present a new clustering algorithm CLUGD relying on grid and density. We first construct a grid of relevant portion. Then the algorithm finds references by grid and classifies these references to core references and bound references. Then it attaches the data of the bound references to the nearest core references and aggregation the core references in neighboring portions. At last, in-direct graph is used to classify these core references and maps cluster to original data. We performed an experimental evaluation of effectiveness and efficiency of CLUGD using synthetic data and the data of the SEQUOIA 2000 Benchmark. Both theory analysis and experimental results confirm that CLUGD can discover clusters with arbitrary shape and is insensitive to noise data. In the meanwhile, its executing efficiency is much higher than DBSCAN algorithm based on R*-tree
机译:随着数据越来越大,数据挖掘算法的效率是一个非常重要的问题。基于密度的聚类分析可以发现具有任意形状的聚类,并且对噪声数据不敏感。基于网格的聚类方法的优点是线性时间复杂度。在本文中,我们提出了一种新的基于网格和密度的聚类算法CLUGD。我们首先构造一个相关部分的网格。然后,该算法按网格查找参考,并将这些参考分类为核心参考和绑定参考。然后,它将绑定引用的数据附加到最近的核心引用,并将核心引用聚合到相邻部分中。最后,使用间接图对这些核心参考进行分类,并将聚类映射到原始数据。我们使用合成数据和SEQUOIA 2000 Benchmark的数据对CLUGD的有效性和效率进行了实验评估。理论分析和实验结果均证实,CLUGD可以发现任意形状的簇,并且对噪声数据不敏感。同时,它的执行效率比基于R * -tree的DBSCAN算法要高得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号