A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

机译：从大数据集中高效提取信息的新型K-Means分层聚类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent emphasis on exploratory data analysis targeted to identifying groups or structures in multivariate data has contributed to the development of a number of clustering techniques. Among these, hierarchical methods and K-means clustering have emerged as two popular approaches. In this paper, we shall present an algorithm that efficiently combines the strong points of these two approaches in a novel two-stage process that outperforms the individual components in both accuracy of representation and computation time. To make the combination more effective, we propose a new initialization scheme for the K-means stage to achieve improved codebook placement. We also propose a novel visualization scheme that combines the Principal Component Analysis (PCA) and Minimal Spanning Tree (MST) in an arrangement that ensures reliability of the visualization. The performance of the clustering and visualization techniques proposed here is illustrated by application to a challenging data set popularly used as a benchmark.

机译：最近针对探索性数据分析的重点是识别多变量数据中的组或结构，这为许多聚类技术的发展做出了贡献。其中，分层方法和K-means聚类已成为两种流行的方法。在本文中，我们将提出一种算法，该算法在新颖的两阶段过程中有效地结合了这两种方法的优点，该过程在表示精度和计算时间上均优于单个组件。为了使组合更有效，我们为K-means阶段提出了一种新的初始化方案，以实现改进的码本放置。我们还提出了一种新颖的可视化方案，该方案将主成分分析（PCA）和最小生成树（MST）组合在一起，以确保可视化的可靠性。本文提出的聚类和可视化技术的性能通过将其应用到通常用作基准的具有挑战性的数据集来说明。

著录项

来源
《Proceedings of the International Conference on Information and Knowledge Engineering(IKE'03)》|2003年|P.390-396|共7页
会议地点
作者
Somnath S. Shahapurkar; Malur K. Sundareshan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与传播理论;
关键词
clustering; classification; knowledge extraction; hierarchical methods; K-means; SOM;

机译：聚类分类知识提取分层方法K-均值SOM;

相似文献

外文文献
中文文献
专利

1. Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets [J] . Bipul Hossen, Rabiul Auwul Biomedical Statistics and Informatics . 2020,第1期

机译：K-Meance的比较研究，用癌症数据集分区麦细管，凝聚等级和戴安纳聚类算法
2. Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering [J] . Wangchamhan Tanachapong, Chiewchanwattana Sirapat, Sunat Khamron Expert Systems with Application . 2017,第deca30期

机译：基于k均值和混沌联赛冠军算法的高效算法，用于数字，分类和混合类型的数据聚类
3. Leaders-Subleaders: An efficient hierarchical clustering algorithm for large data sets [J] . P.A. Vijaya, M. Narasimha Murty, D.K. Subramanian Pattern recognition letters . 2004,第4期

机译：Leaders-Subleaders：针对大型数据集的高效分层聚类算法
4. A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets [C] . Somnath S. Shahapurkar, Malur K. Sundareshan International conference on information and knowledge engineering . 2003

机译：一种新的K-Means分层聚类算法，用于大数据集的有效信息提取
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters [O] . Li Chun Sheng 2011

机译：具有两个聚类的数据集的K均值算法的聚类中心初始化方法

A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅