Fast K-Means Algorithm Clustering

Raied Salman; Vojislav Kecman; Qi Li; Robert Strack; Erik Test

首页> 外文期刊>International Journal of Computer Networks & Communications >Fast K-Means Algorithm Clustering

【24h】

Fast K-Means Algorithm Clustering

机译：快速K均值算法聚类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge datasets. The first stage is a fast distance calculation using only a small portion of the data to produce the best possible location of the centers. The second stage is a slow distance calculation in which the initial centers used are taken from the first stage. The fast and slow stages represent the speed of the movement of the centers. In the slow stage, the whole dataset can be used to get the exact location of the centers. The time cost of the distance calculation for the fast stage is very low due to the small size of the training data chosen. The time cost of the distance calculation for the slow stage is also minimized due to small number of iterations. Different initial locations of the clusters have been used during the test of the proposed algorithms. For large datasets, experiments show that the 2-stage clustering method achieves better speed-up (1-9 times).

机译：最近，k-means被公认为是对无监督数据进行聚类的最佳算法之一。由于k均值主要取决于所有数据点与中心之间的距离计算，因此，当数据集的大小较大（例如，超过5亿个点）时，时间成本将很高。我们提出了一种两阶段算法来减少大型数据集距离计算的时间成本。第一阶段是快速距离计算，仅使用一小部分数据以产生最佳的中心位置。第二阶段是慢距离计算，其中使用的初始中心取自第一阶段。快速和慢速阶段代表中心移动的速度。在慢速阶段，可以使用整个数据集获取中心的确切位置。由于所选训练数据的大小较小，因此快速阶段的距离计算的时间成本非常低。由于迭代次数少，用于慢速阶段的距离计算的时间成本也被最小化。在对提出的算法进行测试期间，使用了群集的不同初始位置。对于大型数据集，实验表明，两阶段聚类方法可实现更好的提速（1-9倍）。

著录项

来源
《International Journal of Computer Networks & Communications》 |2011年第4期|共页
作者
Raied Salman; Vojislav Kecman; Qi Li; Robert Strack; Erik Test;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Fast and stable clustering analysis based on Grid-mapping K-means algorithm and new clustering validity index [J] . Zhu Erzhou, Zhang Yuanxiang, Wen Peng, Neurocomputing . 2019,第Octa21期

机译：基于网格映射K-means算法和新的聚类有效性指标的快速稳定聚类分析
2. Evaluation Of Fuzzy K-Means And K-Means Clustering Algorithms In Intrusion Detection Systems [J] . Farhad Soleimanian Gharehchopogh, Neda Jabbari, Zeinab Ghaffari Azar International Journal of Scientific & Technology Research . 2012,第11期

机译：入侵检测系统中模糊K-均值和K-均值聚类算法的评估
3. A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach [J] . Ben Salem Semeh, Naouali Sami, Chtourou Zied Computers and Electrical Engineering . 2018,第期

机译：基于K-Meancy的方法的大型分类数据集的快速有效的分区聚类算法
4. Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm [C] . Shi Na, Liu Xumin, Guan Yong Intelligent Information Technology and Security Informatics (IITSI), 2010 . 2010

机译：k均值聚类算法研究：一种改进的k均值聚类算法
5. Hardware Implementation and Performance Evaluation of K-Means and K-Means++ Clustering Algorithms [D] . Singh, Manisha . 2019

机译：K-Means和K-Means ++聚类算法的硬件实现和性能评估
6. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality [O] . Xueyi Wang -1

机译：快速精确最近邻居法高维搜索使用K-均值聚类和三角不等式
7. A Highly Efficient Fast Global K-Means Clustering Algorithm [O] . Xian Liang, Fuheng Qu, Yong Yang, 2015

机译：一个高效的快速全局K-means聚类算法

Fast K-Means Algorithm Clustering

摘要

著录项

相似文献

相关主题

期刊订阅