A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree

Mishra Gaurav; Mohanty Sraban Kumar

首页> 外文期刊>Expert systems with applications >A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree

【24h】

A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree

机译：基于局部最近邻居的快速混合聚类技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With rapid explosion of information, clustering emerged as an active research area for knowledge discovery. Most of the existing clustering algorithms become ineffective when inappropriate parameters are provided or applied on a dataset which consists of clusters of diverse shapes, sizes, and varying densities. To overcome these issues, many graph based hybrid clustering algorithms have been proposed but these algorithms first generate a complete graph of the dataset which takes O(N-2) time where N is the number of data points which limits their application on large datasets. This paper proposes an algorithm namely a fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree to reduce the computational overhead. In the first step, the algorithm partitions the dataset into large number of sub-clusters based on dispersion of data points to capture the geometry of clusters. After partitioning the dataset, a minimum spanning tree based on the centroids of each of the sub-clusters is constructed to identify the adjacent pairs. A novel merge method is proposed to find the genuine clusters by repeatedly merging the adjacent sub-clusters. The cohesion and intra-similarity are introduced to compute the level of dispersion of data points with respect to the centers of an adjacent pair and average edge weight of a sub-clusters respectively. The algorithm takes O(N-3/2) time which is root N factor improvement over the popular hybrid clustering algorithms. Experimental analyses on both synthetic as well as gene expression datasets demonstrate that the proposed technique shows significant improvement over competing clustering algorithms in terms of execution time and improved cluster quality. Moreover, the proposed algorithm does not require any user defined parameters and it can estimate the number of clusters more accurately. (C) 2019 Elsevier Ltd. All rights reserved.

机译：随着信息的快速爆炸，集群被出现为知识发现的活跃研究领域。当提供不适当的参数或应用包括不同形状，尺寸和不同密度的集群组成的数据集时，大多数现有聚类算法都变得无效。为了克服这些问题，已经提出了许多基于图的混合聚类算法，但是这些算法首先生成数据集的完整图，该数据集需要O（n-2）时间，其中n是限制其在大型数据集上的数据点的数量。本文提出了一种算法，即使用最小生成树的基于本地最近邻居的快速混合聚类技术，以减少计算开销。在第一步中，算法基于数据点的色散将数据集分为大量子集群，以捕获群集的几何形状。在分区数据集之后，构建基于每个子集群的质心的最小生成树以识别相邻对。建议通过重复合并相邻的子集群来找到新的合并方法来找到真正的群集。引入凝聚力和帧内相似性以分别计算数据点的色谱分散水平和分别的子簇的平均边缘重量的分散。该算法需要O（n-3/2）时间，这是流行混合聚类算法的根部n因子改进。合成和基因表达数据集的实验分析表明，在执行时间和改进的集群质量方面，所提出的技术在竞争聚类算法上显示出显着改善。此外，所提出的算法不需要任何用户定义的参数，并且它可以更准确地估计群集的数量。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert systems with applications》 |2019年第10期|28-43|共16页
作者
Mishra Gaurav; Mohanty Sraban Kumar;
展开▼
作者单位

PDPM Indian Inst Informat Technol Design & Mfg Dept Comp Sci & Engn Jabalpur India;

PDPM Indian Inst Informat Technol Design & Mfg Dept Comp Sci & Engn Jabalpur India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hybrid clustering; Local nearest neighbor; Minimum spanning tree; Dispersion; Merge index; Gene expression dataset;

机译：混合聚类;本地最近的邻居;最小生成树;分散;合并指数;基因表达数据集;

相似文献

外文文献
中文文献
专利

1. A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree [J] . Mishra Gaurav, Mohanty Sraban Kumar Expert Systems with Application . 2019,第OCTa期

机译：使用最小生成树的基于局部最近邻居的快速混合聚类技术
2. Constructing Euclidean minimum spanning trees and all nearest neighbors on reconfigurable meshes [J] . Lai T.H., Ming-Jye Sheng IEEE Transactions on Parallel and Distributed Systems . 1996,第8期

机译：在可重构网格上构造欧几里得最小生成树和所有最近的邻居
3. Fast k-nearest neighbor classification using cluster-based trees [J] . Bin Zhang, Srihari S.N. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2004,第4期

机译：使用基于聚类的树快速进行k最近邻分类
4. A hybrid algorithm of minimum spanning tree and nearest neighbor for classifying human cancers [C] . Chunbao Zhou, Liming Wan, Yanchun Liang 2010 3rd International Conference on Advanced Computer Theory and Engineering . 2010

机译：最小生成树与最近邻居的混合算法用于人类癌症分类
5. A minimum spanning tree based clustering algorithm for high throughput biological data. [D] . Pirim, Harun. 2011

机译：用于高通量生物数据的基于最小生成树的聚类算法。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Weight Adjusted K-Nearest Neighbor dan Minimum Spanning Tree untuk Information Retrieval System di Perpustakaan STMIK PPKIA Tarakanita Rahmawati Tarakan [O] . Indriani, Aida, -, Gunawan, Novianto, Endyk 2013

机译：STMIK PPKIA库中的信息检索系统的权重调整的K最近邻和最小生成树Tarakanita Rahmawati Tarakan

A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree

摘要

著录项

相似文献

相关主题

期刊订阅