首页> 外文期刊>Expert systems with applications >A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree
【24h】

A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree

机译:基于局部最近邻居的快速混合聚类技术

获取原文
获取原文并翻译 | 示例
           

摘要

With rapid explosion of information, clustering emerged as an active research area for knowledge discovery. Most of the existing clustering algorithms become ineffective when inappropriate parameters are provided or applied on a dataset which consists of clusters of diverse shapes, sizes, and varying densities. To overcome these issues, many graph based hybrid clustering algorithms have been proposed but these algorithms first generate a complete graph of the dataset which takes O(N-2) time where N is the number of data points which limits their application on large datasets. This paper proposes an algorithm namely a fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree to reduce the computational overhead. In the first step, the algorithm partitions the dataset into large number of sub-clusters based on dispersion of data points to capture the geometry of clusters. After partitioning the dataset, a minimum spanning tree based on the centroids of each of the sub-clusters is constructed to identify the adjacent pairs. A novel merge method is proposed to find the genuine clusters by repeatedly merging the adjacent sub-clusters. The cohesion and intra-similarity are introduced to compute the level of dispersion of data points with respect to the centers of an adjacent pair and average edge weight of a sub-clusters respectively. The algorithm takes O(N-3/2) time which is root N factor improvement over the popular hybrid clustering algorithms. Experimental analyses on both synthetic as well as gene expression datasets demonstrate that the proposed technique shows significant improvement over competing clustering algorithms in terms of execution time and improved cluster quality. Moreover, the proposed algorithm does not require any user defined parameters and it can estimate the number of clusters more accurately. (C) 2019 Elsevier Ltd. All rights reserved.
机译:随着信息的快速爆炸,集群被出现为知识发现的活跃研究领域。当提供不适当的参数或应用包括不同形状,尺寸和不同密度的集群组成的数据集时,大多数现有聚类算法都变得无效。为了克服这些问题,已经提出了许多基于图的混合聚类算法,但是这些算法首先生成数据集的完整图,该数据集需要O(n-2)时间,其中n是限制其在大型数据集上的数据点的数量。本文提出了一种算法,即使用最小生成树的基于本地最近邻居的快速混合聚类技术,以减少计算开销。在第一步中,算法基于数据点的色散将数据集分为大量子集群,以捕获群集的几何形状。在分区数据集之后,构建基于每个子集群的质心的最小生成树以识别相邻对。建议通过重复合并相邻的子集群来找到新的合并方法来找到真正的群集。引入凝聚力和帧内相似性以分别计算数据点的色谱分散水平和分别的子簇的平均边缘重量的分散。该算法需要O(n-3/2)时间,这是流行混合聚类算法的根部n因子改进。合成和基因表达数据集的实验分析表明,在执行时间和改进的集群质量方面,所提出的技术在竞争聚类算法上显示出显着改善。此外,所提出的算法不需要任何用户定义的参数,并且它可以更准确地估计群集的数量。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号