...
首页> 外文期刊>LIPIcs : Leibniz International Proceedings in Informatics >Accurate MapReduce Algorithms for k-Median and k-Means in General Metric Spaces
【24h】

Accurate MapReduce Algorithms for k-Median and k-Means in General Metric Spaces

机译:通用度量空间中k中值和k均值的精确MapReduce算法

获取原文

摘要

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-median and k-means variants which, given a set P of points from a metric space and a parameter k P , require to identify a set S of k centers minimizing, respectively, the sum of the distances and of the squared distances of all points in P from their closest centers. Our specific focus is on general metric spaces, for which it is reasonable to require that the centers belong to the input set (i.e., S subseteq P). We present coreset-based 3-round distributed approximation algorithms for the above problems using the MapReduce computational model. The algorithms are rather simple and obliviously adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Remarkably, the algorithms attain approximation ratios that can be made arbitrarily close to those achievable by the best known polynomial-time sequential approximations, and they are very space efficient for small D, requiring local memory sizes substantially sublinear in the input size. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance guarantees in general metric spaces.
机译:基于中心的聚类是数据分析的基本原语,并且对于大型数据集变得非常具有挑战性。在本文中,我们关注于流行的k中值和k均值变量,给定度量空间中的一组P点和参数k

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号