首页> 外文会议>Proceedings of the International Conference on Information and Knowledge Engineering(IKE'03) >A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets
【24h】

A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

机译:从大数据集中高效提取信息的新型K-Means分层聚类算法

获取原文

摘要

Recent emphasis on exploratory data analysis targeted to identifying groups or structures in multivariate data has contributed to the development of a number of clustering techniques. Among these, hierarchical methods and K-means clustering have emerged as two popular approaches. In this paper, we shall present an algorithm that efficiently combines the strong points of these two approaches in a novel two-stage process that outperforms the individual components in both accuracy of representation and computation time. To make the combination more effective, we propose a new initialization scheme for the K-means stage to achieve improved codebook placement. We also propose a novel visualization scheme that combines the Principal Component Analysis (PCA) and Minimal Spanning Tree (MST) in an arrangement that ensures reliability of the visualization. The performance of the clustering and visualization techniques proposed here is illustrated by application to a challenging data set popularly used as a benchmark.
机译:最近针对探索性数据分析的重点是识别多变量数据中的组或结构,这为许多聚类技术的发展做出了贡献。其中,分层方法和K-means聚类已成为两种流行的方法。在本文中,我们将提出一种算法,该算法在新颖的两阶段过程中有效地结合了这两种方法的优点,该过程在表示精度和计算时间上均优于单个组件。为了使组合更有效,我们为K-means阶段提出了一种新的初始化方案,以实现改进的码本放置。我们还提出了一种新颖的可视化方案,该方案将主成分分析(PCA)和最小生成树(MST)组合在一起,以确保可视化的可靠性。本文提出的聚类和可视化技术的性能通过将其应用到通常用作基准的具有挑战性的数据集来说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号