【24h】

Improving projected clustering algorithm for high dimensional dataset

机译:改进的高维数据集投影聚类算法

获取原文
获取外文期刊封面目录资料

摘要

The sparsity and the problem of curse of dimensionality of high dimensional data make traditional clustering algorithms such as K-Means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) result in low quality clusters and increase the time complexity exponentially. Many Projected Clustering algorithms have been proposed to deal with noisy High Dimensional Data. However, most of them encounter difficulties when data contain clusters with low dimensionality. This paper proposes an improvement over PCKA-Projected Clustering based on K-Means Algorithm which is a partitional distance based projected clustering algorithm. Existing PCKA performs relevancy analysis to select set of dimensions. However there is no scope for redundancy analysis. Improved PCKA described in this paper performs both attribute redundancy and relevancy analysis, followed by outlier detection. In the last phase, Improved PCKA forms clusters using modified K-Means algorithm. Method used for performing attribute redundancy is unique to this paper. Proposed algorithm is capable of detecting projected clusters of better quality and reduction in the computational time complexity compared to the original PCKA.
机译:高维数据的稀疏性和维数的诅咒问题使传统的聚类算法(例如K-Means,DBSCAN(基于噪声的应用的基于密度的空间聚类))导致质量较低的聚类,并且使时间复杂度成倍增加。已经提出了许多投影聚类算法来处理嘈杂的高维数据。但是,当数据包含维数较低的簇时,它们中的大多数都会遇到困难。本文提出了一种基于K-Means算法的PCKA投影聚类的改进方法,该算法是一种基于分区距离的投影聚类算法。现有的PCKA会进行相关性分析以选择维度集。但是,没有冗余分析的范围。本文描述的改进的PCKA既可以进行属性冗余和相关性分析,又可以进行离群值检测。在最后一个阶段,改进的PCKA使用改进的K-Means算法形成集群。用于执行属性冗余的方法是本文特有的。与原始PCKA相比,提出的算法能够检测质量更高的投影集群,并减少计算时间复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号