An efficient approximation to the K-means clustering for massive data

Capo Marco; Perez Aritz; Lozano Jose A.

首页> 外文期刊>Knowledge-Based Systems >An efficient approximation to the K-means clustering for massive data

【24h】

An efficient approximation to the K-means clustering for massive data

机译：海量数据的K-均值聚类的有效近似

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to manipulate and analyze such information. In spite of its dependency on the initial settings and the large number of distance computations that it can require to converge, the K-means algorithm remains as one of the most popular clustering methods for massive datasets. In this work, we propose an efficient approximation to the K-means problem intended for massive data. Our approach recursively partitions the entire dataset into a small number of subsets, each of which is characterized by its representative (center of mass) and weight (cardinality), afterwards a weighted version of the K-means algorithm is applied over such local representation, which can drastically reduce the number of distances computed. In addition to some theoretical properties, experimental results indicate that our method outperforms well-known approaches, such as the K-meansi-+ and the minibatch K-means, in terms of the relation between number of distance computations and the quality of the approximation. (C) 2016 Elsevier B.V. All rights reserved.

机译：由于在各种科学领域中可用数据量的逐渐增长，操纵和分析此类信息变得更加困难。尽管它依赖于初始设置并且可能需要收敛大量的距离计算，但K-means算法仍然是海量数据集最受欢迎的聚类方法之一。在这项工作中，我们提出了针对海量数据的K均值问题的有效近似方法。我们的方法将整个数据集递归地划分为少量子集，每个子集均以其代表（质心）和权重（基数）为特征，然后将K-means算法的加权版本应用于此类局部表示，这样可以大大减少计算出的距离数。除一些理论性质外，实验结果表明，在距离计算次数与近似质量之间的关系方面，我们的方法优于K-meansi- +和minibatch K-means等著名方法。。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2017年第2期|56-69|共14页
作者
Capo Marco; Perez Aritz; Lozano Jose A.;
展开▼
作者单位

Basque Ctr Appl Math, Bilbao 48009, Spain;

Basque Ctr Appl Math, Bilbao 48009, Spain;

Basque Ctr Appl Math, Bilbao 48009, Spain|Univ Basque Country, UPV EHU, Dept Comp Sci & Artificial Intelligence, Intelligent Syst Grp, Donostia San Sebastian 20018, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
K-means; Clustering; K-means plus; Minibatch; K-means;

机译：K均值;聚类;K均值加;小批量;K均值;

相似文献

外文文献
中文文献
专利

1. Irrelevant data elimination based on a k-means clustering algorithm for efficient data aggregation and human activity classification in smart home sensor networks [J] . Siriporn Pattamaset, Jae Sung Choi International Journal of Distributed Sensor Networks . 2020,第6期

机译：基于K-Means聚类算法的无关数据消除，以实现智能家居传感器网络中的高效数据聚集和人类活动分类
2. An efficient K-means clustering algorithm for tall data [J] . Data mining and knowledge discovery . 2020,第3期

机译：高大数据的高效k均值聚类算法
3. STiMR k-Means: An Efficient Clustering Method for Big Data [J] . Ben HajKacem Mohamed Aymen, Ben Ncir Chiheb-Eddine, Essoussi Nadia International Journal of Pattern Recognition and Artificial Intelligence . 2019,第8期

机译：STiMR k-Means：大数据的有效聚类方法
4. Efficient Enhanced K-Means Clustering for Semi-Blind Channel Estimation of Cell-Free Massive MIMO [C] . Xuefeng Huang, Xu Zhu, Yufei Jiang, IEEE International Conference on Communications . 2020

机译：无单元大规模MIMO的半盲信道估计的高效增强型K均值聚类
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. Scaling the Poisson GLM to massive neural datasets through polynomial approximations [O] . David M. Zoltowski, Jonathan W. Pillow -1

机译：通过多项式逼近将Poisson GLM缩放为海量神经数据集
7. An efficient approximation to the K-means clustering for massive data [O] . Marco Capó, Aritz Pérez, Jose A. Lozano 2017

机译：对大规模数据的K-means聚类的有效近似

An efficient approximation to the K-means clustering for massive data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅