首页> 外文期刊>International Journal of Database Management Systems >An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means
【24h】

An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means

机译:通过主成分分析和改进的K均值改进高维数据聚类性能的有效方法

获取原文
       

摘要

Clustering analysis is one of the main analytical methods in data mining. K-means is the most popular and partition based clustering algorithm. But it is computationally expensive and the quality of resulting clusters heavily depends on the selection of initial centroid and the dimension of the data. Several methods have been proposed in the literature for improving performance of the k-means clustering algorithm. Principal Component Analysis (PCA) is an important approach to unsupervised dimensionality reduction technique. This paper proposed a method to make the algorithm more effective and efficient by using PCA and modified k-means. In this paper, we have used Principal Component Analysis as a first phase to find the initial centroid for k-means and for dimension reduction and k-means method is modified by using heuristics approach to reduce the number of distance calculation to assign the data-point to cluster. By comparing the results of original and new approach, it was found that the results obtained are more effective, easy to understand and above all, the time taken to process the data was substantially reduced.
机译:聚类分析是数据挖掘中的主要分析方法之一。 K-means是最流行的基于分区的聚类算法。但这在计算上很昂贵,并且生成的簇的质量在很大程度上取决于初始质心的选择和数据的维数。文献中已经提出了几种方法来改善k-means聚类算法的性能。主成分分析(PCA)是无监督降维技术的重要方法。提出了一种使用PCA和改进的k均值算法使算法更加有效和高效的方法。在本文中,我们已使用主成分分析作为第一阶段来找到k均值和降维的初始质心,并通过启发式方法修改了k均值方法,以减少距离计算的次数,从而分配数据。指向集群。通过比较原始方法和新方法的结果,发现所获得的结果更加有效,易于理解,并且最重要的是,处理数据所需的时间大大减少了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号