...
首页> 外文期刊>Knowledge-Based Systems >An efficient approximation to the K-means clustering for massive data
【24h】

An efficient approximation to the K-means clustering for massive data

机译:海量数据的K-均值聚类的有效近似

获取原文
获取原文并翻译 | 示例

摘要

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to manipulate and analyze such information. In spite of its dependency on the initial settings and the large number of distance computations that it can require to converge, the K-means algorithm remains as one of the most popular clustering methods for massive datasets. In this work, we propose an efficient approximation to the K-means problem intended for massive data. Our approach recursively partitions the entire dataset into a small number of subsets, each of which is characterized by its representative (center of mass) and weight (cardinality), afterwards a weighted version of the K-means algorithm is applied over such local representation, which can drastically reduce the number of distances computed. In addition to some theoretical properties, experimental results indicate that our method outperforms well-known approaches, such as the K-meansi-+ and the minibatch K-means, in terms of the relation between number of distance computations and the quality of the approximation. (C) 2016 Elsevier B.V. All rights reserved.
机译:由于在各种科学领域中可用数据量的逐渐增长,操纵和分析此类信息变得更加困难。尽管它依赖于初始设置并且可能需要收敛大量的距离计算,但K-means算法仍然是海量数据集最受欢迎的聚类方法之一。在这项工作中,我们提出了针对海量数据的K均值问题的有效近似方法。我们的方法将整个数据集递归地划分为少量子集,每个子​​集均以其代表(质心)和权重(基数)为特征,然后将K-means算法的加权版本应用于此类局部表示,这样可以大大减少计算出的距离数。除一些理论性质外,实验结果表明,在距离计算次数与近似质量之间的关系方面,我们的方法优于K-meansi- +和minibatch K-means等著名方法。 。 (C)2016 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2017年第2期|56-69|共14页
  • 作者单位

    Basque Ctr Appl Math, Bilbao 48009, Spain;

    Basque Ctr Appl Math, Bilbao 48009, Spain;

    Basque Ctr Appl Math, Bilbao 48009, Spain|Univ Basque Country, UPV EHU, Dept Comp Sci & Artificial Intelligence, Intelligent Syst Grp, Donostia San Sebastian 20018, Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    K-means; Clustering; K-means plus; Minibatch; K-means;

    机译:K均值;聚类;K均值加;小批量;K均值;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号