首页> 外文会议>International conference on information and knowledge engineering >A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets
【24h】

A Novel K-Means Hierarchical Clustering Algorithm for Efficient Information Extraction from Large Data Sets

机译:一种新的K-Means分层聚类算法,用于大数据集的有效信息提取

获取原文

摘要

Recent emphasis on exploratory data analysis targeted to identifying groups or structures in multivariate data has contributed to the development of a number of clustering techniques. Among these, hierarchical methods and K-means clustering have emerged as two popular approaches. In this paper, we shall present an algorithm that efficiently combines the strong points of these two approaches in a novel two-stage process that outperforms the individual components in both accuracy of representation and computation time. To make the combination more effective, we propose a new initialization scheme for the K-means stage to achieve improved codebook placement. We also propose a novel visualization scheme that combines the Principal Component Analysis (PCA) and Minimal Spanning Tree (MST) in an arrangement that ensures reliability of the visualization. The performance of the clustering and visualization techniques proposed here is illustrated by application to a challenging data set popularly used as a benchmark.
机译:最近强调针对多元数据中识别组或结构的探索性数据分析,这有助于开发多种聚类技术。其中,分层方法和K-Means聚类已成为两个流行的方法。在本文中,我们将提出一种算法,其有效地结合了这两种方法的强点在新的两级过程中,这两级过程中的代表准确性和计算时间的精度优于各个组件。为了使组合更有效,我们提出了一个新的初始化方案,了解k均值阶段,以实现改进的码本放置。我们还提出了一种新颖的可视化方案,该方案将主成分分析(PCA)和最小生成树(MST)组合在确保可视化可靠性的布置中。这里提出的群集和可视化技术的性能通过应用于普遍用作基准的具有挑战性的数据集来说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号