【24h】

On coresets for k-means and k-median clustering

机译:关于k均值和k中值聚类的核心集

获取原文

摘要

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set S ⊆ P, of size O(k ε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1+ε)-approximation. As a result, we improve the fastest known algorithms for (1+ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1+ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.
机译:在本文中,我们显示了针对低维点计算k中值和k均值聚类问题的小型核集的存在。换句话说,我们证明给定R d 中的点集P,可以计算大小为O(kε -d log n的加权集S⊆P ),这样就可以计算在S而不是P上的k中值/均值聚类,并获得(1 +ε)逼近。结果,我们改进了(1 +ε)近似k均值和k中值的最快已知算法。对于固定的k和ε,我们的算法具有 linear 的运行时间。此外,使用多对数空间和更新时间,当仅插入点时,我们可以维持流的(1 +ε)-近似k中值或k-均值聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号