首页> 外文会议>First International Conference on Integrated Intelligent Computing >An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis
【24h】

An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis

机译:使用主成分分析提高高维数据聚类的性能

获取原文

摘要

In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality of the resulting clusters heavily depends on the selection of initial centroid and dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimensions of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. So it is required to reduce the dimensionality of the given dataset in order to improve the efficiency and accuracy. This paper proposed a new approach to improve the accuracy of the cluster results by using PCA to determine the initial centroid and also to reduce the dimension of the data.
机译:在许多应用领域中,例如信息检索,计算生物学和图像处理,数据维度通常很高。由于维数的诅咒,为高维数据集开发有效的聚类方法是一个具有挑战性的问题。 k均值聚类算法用于许多实际应用。但这在计算上很昂贵,并且生成的簇的质量在很大程度上取决于初始质心的选择和数据的维数。当数据集的维数很高时,结果值的准确性可能达不到预期的水平,因为我们不能说所选的数据集没有噪音和无瑕疵。因此需要降低给定数据集的维数以提高效率和准确性。本文提出了一种新的方法,通过使用PCA确定初始质心并减小数据的维数来提高聚类结果的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号