...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A distance based clustering method for arbitrary shaped clusters in large datasets
【24h】

A distance based clustering method for arbitrary shaped clusters in large datasets

机译:大型数据集中任意形状聚类的基于距离的聚类方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.
机译:聚类已广泛应用于科学,技术,社会科学等不同领域。自然地,聚类在数据集中呈任意(非凸)形状。聚类的重要一类是基于距离的方法。但是,基于距离的聚类方法通常会找到凸形的聚类。经典的单链接是一种基于距离的聚类方法,可以找到任意形状的聚类。它多次扫描数据集,时间要求为O(n2),其中n是数据集的大小。对于大型数据集,这可能是一个严重的问题。在本文中,我们提出了一种基于距离的聚类方法l-SL,以在大型数据集中找到任意形状的聚类。在这种方法中,将第一领导者聚类方法应用于数据集以导出一组领导者;随后将单链接方法(具有距离停止标准)应用于领导者集以获得最终聚类。 l-SL方法产生平坦聚类。它比直接应用于数据集的单链接方法要快得多。 l-SL的聚类结果可能名义上与直接应用于数据集的单链接方法(距离停止标准)的最终聚类有偏差。为了补偿l-SL的偏差,还提出了一种改进方法。实验是使用标准的现实世界和合成数据集进行的。实验结果表明,所提出的聚类方法对大型数据集的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号