
Improving projected clustering algorithm for high dimensional dataset




The sparsity and the problem of curse of dimensionality of high dimensional data make traditional clustering algorithms such as K-Means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) result in low quality clusters and increase the time complexity exponentially. Many Projected Clustering algorithms have been proposed to deal with noisy High Dimensional Data. However, most of them encounter difficulties when data contain clusters with low dimensionality. This paper proposes an improvement over PCKA-Projected Clustering based on K-Means Algorithm which is a partitional distance based projected clustering algorithm. Existing PCKA performs relevancy analysis to select set of dimensions. However there is no scope for redundancy analysis. Improved PCKA described in this paper performs both attribute redundancy and relevancy analysis, followed by outlier detection. In the last phase, Improved PCKA forms clusters using modified K-Means algorithm. Method used for performing attribute redundancy is unique to this paper. Proposed algorithm is capable of detecting projected clusters of better quality and reduction in the computational time complexity compared to the original PCKA.



