A distance based clustering method for arbitrary shaped clusters in large datasets

Patra B.K.; Nandi S.; Viswanath P.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A distance based clustering method for arbitrary shaped clusters in large datasets

【24h】

A distance based clustering method for arbitrary shaped clusters in large datasets

机译：大型数据集中任意形状聚类的基于距离的聚类方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.

机译：聚类已广泛应用于科学，技术，社会科学等不同领域。自然地，聚类在数据集中呈任意（非凸）形状。聚类的重要一类是基于距离的方法。但是，基于距离的聚类方法通常会找到凸形的聚类。经典的单链接是一种基于距离的聚类方法，可以找到任意形状的聚类。它多次扫描数据集，时间要求为O（n2），其中n是数据集的大小。对于大型数据集，这可能是一个严重的问题。在本文中，我们提出了一种基于距离的聚类方法l-SL，以在大型数据集中找到任意形状的聚类。在这种方法中，将第一领导者聚类方法应用于数据集以导出一组领导者;随后将单链接方法（具有距离停止标准）应用于领导者集以获得最终聚类。 l-SL方法产生平坦聚类。它比直接应用于数据集的单链接方法要快得多。 l-SL的聚类结果可能名义上与直接应用于数据集的单链接方法（距离停止标准）的最终聚类有偏差。为了补偿l-SL的偏差，还提出了一种改进方法。实验是使用标准的现实世界和合成数据集进行的。实验结果表明，所提出的聚类方法对大型数据集的有效性。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2011年第12期|共9页
作者
Patra B.K.; Nandi S.; Viswanath P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Arbitrary shaped clusters; Distance based clustering; Hybrid clustering method; Large datasets; Leaders; Single-link;

机译：任意形状的聚类;基于距离的聚类;混合聚类方法;大数据集;领导者;单链接;

相似文献

外文文献
中文文献
专利

1. A distance based clustering method for arbitrary shaped clusters in large datasets [J] . Patra B.K., Nandi S., Viswanath P. Pattern Recognition: The Journal of the Pattern Recognition Society . 2011,第12期

机译：大型数据集中任意形状聚类的基于距离的聚类方法
2. Sequential extraction of arbitrarily shaped clusters based on pseudo-membership generation method [J] . Seiji Hotta 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2002,第707期

机译：基于伪隶属度生成方法的任意形聚类顺序提取
3. Sequential extraction of arbitrarily shaped clusters based on pseudo-membership generation method [J] . Seiji Hotta 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2002,第707期

机译：基于伪成员生成方法的任意形状簇的顺序提取
4. ABACUS: Mining Arbitrary Shaped Clusters from Large Datasets based on Backbone Identification [C] . Vineet Chaoji, Geng Li, Hilmi Yildirim, SIAM International Conference on Data Mining . 2011

机译：算盘：基于骨干识别的大型数据集挖掘任意形状的群集
5. Efficient algorithms for mining arbitrary shaped clusters. [D] . Chaoji, Vineet. 2009

机译：用于挖掘任意形状簇的高效算法。
6. Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis [O] . Fernanda S. Nascimento, Joel Barratt, Katelyn Houghton, 2020

机译：使用流行病学定义的环孢菌素簇聚类MLST数据集的基于集群基于距离统计的评估
7. Robust Method for Clustering Arbitrarily-shaped Clusters Based on Labeling by Ascending Order Distance between Clusters [O] . 今村, 弘樹, 藤村, 誠, 黒田, 英夫 2006

机译：基于群集上升阶距离标记聚类任意形状聚类的鲁棒方法
8. Incremental Model-Based Clustering for Large Datasets With Small Clusters [R] . Fraley, C. , Raftery, A. , Wehrensy, R. 2003

机译：基于增量模型的聚类适用于具有小集群的大型数据集

A distance based clustering method for arbitrary shaped clusters in large datasets

摘要

著录项

相似文献

相关主题

期刊订阅