首页> 外文期刊>Journal of Intelligent Information Systems >A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets
【24h】

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

机译:基于生成树的群集异构数据集的基于生成树的分区和合并技术

获取原文
获取原文并翻译 | 示例
       

摘要

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.
机译:聚类是一种无监督的学习技术,由于其对域知识的依赖性较少而广泛用于了解知识发现。在文献中提出了许多聚类技术,以识别不同特征的簇。由于它们对用户定义的参数或应用于多尺度数据集时,它们中的大多数都变得不足。已经提出了混合聚类技术通过首先将数据集分区为多个密集的子集群并将它们合并到实际集群中来利用分区和分层技术的优点。然而,分区和合并标准的普遍性不足以捕获簇的许多特征。最小的生成树(MST)已广泛用于聚类,因为即使在图形的稀疏后,它也保留了数据集的内在性质。在本文中,我们提出了一种无参数,最小生成树的基于生成树的高效混合聚类算法来聚类多尺度数据集。在第一阶段中,我们构建数据集的MST以捕获数据点的邻域信息,并采用框 - 绘图,对MST边缘权重的异常检测技术,以有效地选择不一致的边沿将数据点分配成几个密集的子点。集群。在第二阶段,我们提出了一种新颖的合并标准,通过迭代地合并相邻的子集群对找到真正的簇。合并技术涉及使用两个相邻成对的拓扑的分配和相似性,这有助于识别任意形状和变化的密度簇。各种综合和现实世界数据集的实验结果证明了在其他流行聚类算法上提出的技术的卓越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号