首页> 外文期刊>Machine Learning >A framework for statistical clustering with constant time approximation algorithms for K-median and K -means clustering
【24h】

A framework for statistical clustering with constant time approximation algorithms for K-median and K -means clustering

机译:用于K中值和K均值聚类的具有恒定时间近似算法的统计聚类框架

获取原文
获取原文并翻译 | 示例
           

摘要

We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.
机译:我们考虑一个基于样本的聚类框架。在这种设置下,聚类算法的输入是通过某些未知的任意分布i.i.d生成的样本。基于此类样本,算法必须输出完整域集的聚类,并针对基础分布进行评估。我们提供了关于聚类问题的一般条件,这些条件意味着存在基于样本的聚类算法,该算法近似于最佳聚类。我们表明,K均值聚类以及K均值和向量量化问题均满足这些条件。我们的结果适用于组合优化设置,其中假设可以在恒定时间内对输入集进行均匀采样,则针对K值中位数和K均值聚类问题,我们获得了基于采样的算法,该算法找到了几乎最优的时间中心仅取决于近似值的置信度和精度参数,而与输入大小无关。此外,在欧几里得输入的情况下,我们的算法的运行时间对欧几里得维数的依赖性仅是线性的。我们的主要技术工具是基于中心的聚类的一致收敛结果,可以看作表明k个中心聚类的有效VC维等于k。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号