【24h】

Coresets and approximate clustering for Bregman divergences

机译:Bregman散度的核心集和近似聚类

获取原文

摘要

We study the generalized k-median problem with respect to a Bregman divergence Dφ. Given a finite set P ⊆ Rd of size n, our goal is to find a set C of size k such that the sum of errors cost(P, C) = ΣpεP mincεC {DΦ(p, c)} is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We give the first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence. Using these coresets, we give a (1 + ε)-approximation algorithm for the Bregman k-median problem with running time O (dkn + d22(k/c)thetas;(1) logk+2n). This result improves over the previousely fastest known (1 + ε)-approximation algorithm from [1]. Unlike the analysis of most coreset constructions our analysis does not rely on the construction of ε-nets. Instead, we prove our results by purely combinatorial means.
机译:我们研究关于Bregman发散Dφ的广义k中值问题。给定大小为n的有限集合P⊆Rd,我们的目标是找到大小为k的集合C,以使误差之和cost(P,C)=ΣpεPmincεC{DΦ(p,c)}最小。 Bregman k中位数问题在许多应用中起着重要作用,例如信息论,统计,文本分类和语音处理。对于大量的Bregman散度子集,我们提供了针对该问题的第一个核集构造,其中包括重要的差异性度量,例如Kullback-Leibler散度和Itakura-Saito散度。使用这些核心集,我们针对运行时间为O(dkn + d22(k / c)θ;(1)logk + 2n)的Bregman k中值问题给出(1 +ε)近似算法。此结果比[1]中已知的最快的(1 +ε)近似算法有所改进。与大多数核心集结构的分析不同,我们的分析不依赖于ε-网络的结构。相反,我们通过纯粹的组合方法证明了我们的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号