An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means

Tajunisha; Saravanan

首页> 外文期刊>International Journal of Database Management Systems >An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means

【24h】

An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means

机译：通过主成分分析和改进的K均值改进高维数据聚类性能的有效方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering analysis is one of the main analytical methods in data mining. K-means is the most popular and partition based clustering algorithm. But it is computationally expensive and the quality of resulting clusters heavily depends on the selection of initial centroid and the dimension of the data. Several methods have been proposed in the literature for improving performance of the k-means clustering algorithm. Principal Component Analysis (PCA) is an important approach to unsupervised dimensionality reduction technique. This paper proposed a method to make the algorithm more effective and efficient by using PCA and modified k-means. In this paper, we have used Principal Component Analysis as a first phase to find the initial centroid for k-means and for dimension reduction and k-means method is modified by using heuristics approach to reduce the number of distance calculation to assign the data-point to cluster. By comparing the results of original and new approach, it was found that the results obtained are more effective, easy to understand and above all, the time taken to process the data was substantially reduced.

机译：聚类分析是数据挖掘中的主要分析方法之一。 K-means是最流行的基于分区的聚类算法。但这在计算上很昂贵，并且生成的簇的质量在很大程度上取决于初始质心的选择和数据的维数。文献中已经提出了几种方法来改善k-means聚类算法的性能。主成分分析（PCA）是无监督降维技术的重要方法。提出了一种使用PCA和改进的k均值算法使算法更加有效和高效的方法。在本文中，我们已使用主成分分析作为第一阶段来找到k均值和降维的初始质心，并通过启发式方法修改了k均值方法，以减少距离计算的次数，从而分配数据。指向集群。通过比较原始方法和新方法的结果，发现所获得的结果更加有效，易于理解，并且最重要的是，处理数据所需的时间大大减少了。

著录项

来源
《International Journal of Database Management Systems》 |2011年第1期|共页
作者
Tajunisha; Saravanan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. AN IMPROVED AND EFFICIENT HYBRIDIZED K-MEANS CLUSTERING ALGORITHM FOR HIGHDIMENSIONAL DATASET & IT?S PERFORMANCE ANALYSIS [J] . Prof H.S Behera, Rosly Boy Lingdoh, Diptendra Kodamasingh International Journal on Computer Science and Engineering . 2011,第3期

机译：高维数据集的改进高效混合K均值聚类算法及其性能分析
2. Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios [J] . Bonner Ashley J., Beyene Joseph Communications in Statistics . 2017,第5期

机译：评估稀疏主成分分析方法在高维数据场景中的性能
3. CLUSTERING OF HIGH DIMENSIONAL DATASET USING K-MAM (MAX-AVG-MIN) METHOD WITH PRINCIPAL COMPONENT ANALYSIS A HYBRID APPROACH [J] . S.DHANABAL, DR.S.CHANDRAMATHI Journal of Theoretical and Applied Information Technology . 2014,第1期

机译：使用K-MAM（MAX-AVG-MIN）方法具有主成分分析的高维数据集进行混合方法
4. An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis [C] . Tajunisha N., Saravanan V. First International Conference on Integrated Intelligent Computing . 2010

机译：使用主成分分析提高高维数据聚类的性能
5. Efficient Clustering via Kernel Principal Component Analysis and Optimal One-Dimensional Thresholding [D] . Bhide, Nachiket. 2021

机译：通过内核主体成分分析和最佳一维阈值的高效群集
6. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks [O] . Yihang Yin, Fengzheng Liu, Xiang Zhou, 2015

机译：基于空间聚类和主成分分析的无线传感器网络高效数据压缩模型
7. An Efficient Method to Improve the Clustering Performance using Hybrid Robust Principal Component Analysis-Spectral biclustering in Rainfall Patterns Identification [O] . Shazlyn Milleana Shaharudin, Shuhaida Ismail, Siti Mariana Che Mat Nor, 2019

机译：一种有效的方法，可以使用杂交鲁棒主成分分析 - 在降雨模式中使用混合鲁棒主成分分析 - 光谱双板识别

An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means

摘要

著录项

相似文献

相关主题

期刊订阅