Improving projected clustering algorithm for high dimensional dataset

机译：改进的高维数据集投影聚类算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The sparsity and the problem of curse of dimensionality of high dimensional data make traditional clustering algorithms such as K-Means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) result in low quality clusters and increase the time complexity exponentially. Many Projected Clustering algorithms have been proposed to deal with noisy High Dimensional Data. However, most of them encounter difficulties when data contain clusters with low dimensionality. This paper proposes an improvement over PCKA-Projected Clustering based on K-Means Algorithm which is a partitional distance based projected clustering algorithm. Existing PCKA performs relevancy analysis to select set of dimensions. However there is no scope for redundancy analysis. Improved PCKA described in this paper performs both attribute redundancy and relevancy analysis, followed by outlier detection. In the last phase, Improved PCKA forms clusters using modified K-Means algorithm. Method used for performing attribute redundancy is unique to this paper. Proposed algorithm is capable of detecting projected clusters of better quality and reduction in the computational time complexity compared to the original PCKA.

机译：高维数据的稀疏性和维数的诅咒问题使传统的聚类算法（例如K-Means，DBSCAN（基于噪声的应用的基于密度的空间聚类））导致质量较低的聚类，并且使时间复杂度成倍增加。已经提出了许多投影聚类算法来处理嘈杂的高维数据。但是，当数据包含维数较低的簇时，它们中的大多数都会遇到困难。本文提出了一种基于K-Means算法的PCKA投影聚类的改进方法，该算法是一种基于分区距离的投影聚类算法。现有的PCKA会进行相关性分析以选择维度集。但是，没有冗余分析的范围。本文描述的改进的PCKA既可以进行属性冗余和相关性分析，又可以进行离群值检测。在最后一个阶段，改进的PCKA使用改进的K-Means算法形成集群。用于执行属性冗余的方法是本文特有的。与原始PCKA相比，提出的算法能够检测质量更高的投影集群，并减少计算时间复杂度。

著录项

来源
《IEEE International Conference on Recent Trends In Electronics Information Communication Technology》|2016年|1411-1415|共5页
会议地点
作者
Madhuri Dighe; Gajanan Gawde;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Clustering algorithms; Algorithm design and analysis; Partitioning algorithms; Shape; Redundancy; Feature extraction; Time complexity;

机译：聚类算法;算法设计与分析;分区算法;形状;冗余;特征提取;时间复杂度;

相似文献

外文文献
中文文献
专利

1. An improved frequency based agglomerative clustering algorithm for detecting distinct clusters on two dimensional dataset [J] . Madheswaran M., Sreedhar Kumar S. Journal of Engineering and Technology Research . 2017,第4期

机译：一种改进的基于频率的聚集聚类算法，用于检测二维数据集上的不同聚类
2. Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset [J] . Asha Gowda Karegowda, Vidya T. Shama, M.A. Jayaram, International journal of soft computing . 2012,第5期

机译：基于遗传算法的降维方法提高K-Means聚类性能：以医学数据集分类为例
3. Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset [J] . Asha Gowda Karegowda, Vidya T. Shama, M.A. Jayaram, International journal of soft computing . 2012,第5期

机译：基于遗传算法的降维方法提高K-Means聚类性能：以医学数据集分类为例
4. Improving Projected Clustering Algorithm for High Dimensional Dataset [C] . Madhuri Dighe, Gajanan Gawde IEEE International Conference on Recent Trends in Electronics, Information Communication Technology . 2016

机译：改进高维数据集预计集群算法
5. Classification and Dimensional Reduction Algorithms for Very Large Biomedical Datasets [D] . Li, Huamin. 2017

机译：超大型生物医学数据集的分类和降维算法
6. SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large High-Dimensional Flow Cytometry Datasets Part 1: Algorithm Design [O] . Iftekhar Naim, Suprakash Datta, Jonathan Rebhahn, -1

机译：SWIFT-用于大型高维流式细胞术数据集中自动识别稀有细胞群体的可伸缩聚类第1部分：算法设计
7. Parallel Algorithms For Clustering High-Dimensional Large-Scale Datasets [O] . Harsha Nagesh, Sanjay Goil, Alok Choudhary 2007

机译：用于聚类高维大规模数据集的并行算法
8. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. [R] . Zhao, Y., Karypis, G. 2002

机译：文档数据集的层次聚类算法评估。

Improving projected clustering algorithm for high dimensional dataset

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅