A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Mishra Gaurav; Mohanty Sraban Kumar

首页> 外文期刊>Journal of Intelligent Information Systems >A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

【24h】

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

机译：基于生成树的群集异构数据集的基于生成树的分区和合并技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.

机译：聚类是一种无监督的学习技术，由于其对域知识的依赖性较少而广泛用于了解知识发现。在文献中提出了许多聚类技术，以识别不同特征的簇。由于它们对用户定义的参数或应用于多尺度数据集时，它们中的大多数都变得不足。已经提出了混合聚类技术通过首先将数据集分区为多个密集的子集群并将它们合并到实际集群中来利用分区和分层技术的优点。然而，分区和合并标准的普遍性不足以捕获簇的许多特征。最小的生成树（MST）已广泛用于聚类，因为即使在图形的稀疏后，它也保留了数据集的内在性质。在本文中，我们提出了一种无参数，最小生成树的基于生成树的高效混合聚类算法来聚类多尺度数据集。在第一阶段中，我们构建数据集的MST以捕获数据点的邻域信息，并采用框 - 绘图，对MST边缘权重的异常检测技术，以有效地选择不一致的边沿将数据点分配成几个密集的子点。集群。在第二阶段，我们提出了一种新颖的合并标准，通过迭代地合并相邻的子集群对找到真正的簇。合并技术涉及使用两个相邻成对的拓扑的分配和相似性，这有助于识别任意形状和变化的密度簇。各种综合和现实世界数据集的实验结果证明了在其他流行聚类算法上提出的技术的卓越性能。

著录项

来源
《Journal of Intelligent Information Systems》 |2020年第3期|587-606|共20页
作者
Mishra Gaurav; Mohanty Sraban Kumar;
展开▼
作者单位

PDPM Indian Inst Informat Technol Design & Mfg Dept Comp Sci & Engn Jabalpur India;

PDPM Indian Inst Informat Technol Design & Mfg Dept Comp Sci & Engn Jabalpur India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Partitioning and merging approach; Minimum spanning tree based clustering; Box-plot method; Clustering multi-scale datasets;

机译：分区和合并方法;基于生成树的最小群集;框 - 绘图方法;群集多尺度数据集;

相似文献

外文文献
中文文献
专利

1. Minimum spanning tree based split-and-merge: A hierarchical clustering method [J] . Zhong C., Miao D., Fr?nti P. Information Sciences: An International Journal . 2011,第16期

机译：基于最小生成树的拆分合并：一种分层聚类方法
2. A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree [J] . Mishra Gaurav, Mohanty Sraban Kumar Expert Systems with Application . 2019,第OCTa期

机译：使用最小生成树的基于局部最近邻居的快速混合聚类技术
3. A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree [J] . Mishra Gaurav, Mohanty Sraban Kumar Expert systems with applications . 2019,第Octa期

机译：基于局部最近邻居的快速混合聚类技术
4. A partition scheme for clustering based on sequential representation of minimum spanning tree [C] . Guanwei Wang, Chunxia Zhang, Qingyan Yin International Conference on Industrial Electronics and Engineering . 2015

机译：基于最小生成树顺序表示的聚类分区方案
5. A minimum spanning tree based clustering algorithm for high throughput biological data. [D] . Pirim, Harun. 2011

机译：用于高通量生物数据的基于最小生成树的聚类算法。
6. Visualization of very large high-dimensional data sets as minimum spanning trees [O] . Daniel Probst, Jean-Louis Reymond 2020

机译：将非常大的高维数据集可视化为最小生成树
7. Ant-MST: An ant-based minimum spanning tree for gene expression data clustering [O] . Deyu Zhou, Yulan He, Chee Keong Kwoh, 2014

机译：ant-msT：基于蚂蚁的最小生成树，用于基因表达数据聚类

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

摘要

著录项

相似文献

相关主题

期刊订阅