A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

机译：一种新的K-Means分层聚类算法，用于大数据集的有效信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent emphasis on exploratory data analysis targeted to identifying groups or structures in multivariate data has contributed to the development of a number of clustering techniques. Among these, hierarchical methods and K-means clustering have emerged as two popular approaches. In this paper, we shall present an algorithm that efficiently combines the strong points of these two approaches in a novel two-stage process that outperforms the individual components in both accuracy of representation and computation time. To make the combination more effective, we propose a new initialization scheme for the K-means stage to achieve improved codebook placement. We also propose a novel visualization scheme that combines the Principal Component Analysis (PCA) and Minimal Spanning Tree (MST) in an arrangement that ensures reliability of the visualization. The performance of the clustering and visualization techniques proposed here is illustrated by application to a challenging data set popularly used as a benchmark.

机译：最近强调针对多元数据中识别组或结构的探索性数据分析，这有助于开发多种聚类技术。其中，分层方法和K-Means聚类已成为两个流行的方法。在本文中，我们将提出一种算法，其有效地结合了这两种方法的强点在新的两级过程中，这两级过程中的代表准确性和计算时间的精度优于各个组件。为了使组合更有效，我们提出了一个新的初始化方案，了解k均值阶段，以实现改进的码本放置。我们还提出了一种新颖的可视化方案，该方案将主成分分析（PCA）和最小生成树（MST）组合在确保可视化可靠性的布置中。这里提出的群集和可视化技术的性能通过应用于普遍用作基准的具有挑战性的数据集来说明。

著录项

来源
《International conference on information and knowledge engineering》|2003年||共7页
会议地点
作者
Somnath S. Shahapurkar; Malur K. Sundareshan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与传播理论;
关键词
clustering; classification; knowledge extraction; hierarchical methods; K-means; SOM;

机译：聚类;分类;知识提取;分层方法;K-means;SOM;

相似文献

外文文献
中文文献
专利

1. Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets [J] . Bipul Hossen, Rabiul Auwul Biomedical Statistics and Informatics . 2020,第1期

机译：K-Meance的比较研究，用癌症数据集分区麦细管，凝聚等级和戴安纳聚类算法
2. Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering [J] . Wangchamhan Tanachapong, Chiewchanwattana Sirapat, Sunat Khamron Expert Systems with Application . 2017,第deca30期

机译：基于k均值和混沌联赛冠军算法的高效算法，用于数字，分类和混合类型的数据聚类
3. Leaders-Subleaders: An efficient hierarchical clustering algorithm for large data sets [J] . P.A. Vijaya, M. Narasimha Murty, D.K. Subramanian Pattern recognition letters . 2004,第4期

机译：Leaders-Subleaders：针对大型数据集的高效分层聚类算法
4. A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets [C] . Somnath S. Shahapurkar, Malur K. Sundareshan Proceedings of the International Conference on Information and Knowledge Engineering(IKE'03) . 2003

机译：从大数据集中高效提取信息的新型K-Means分层聚类算法
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters [O] . Li Chun Sheng 2011

机译：具有两个聚类的数据集的K均值算法的聚类中心初始化方法

A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅