首页> 外文会议>Annual ACM-SIAM Symposium on Discrete Algorithms >Coresets and Approximate Clustering for Bregman Divergences
【24h】

Coresets and Approximate Clustering for Bregman Divergences

机译:BREGMAN分歧的导体和近似聚类

获取原文
获取外文期刊封面目录资料

摘要

We study the generalized k-median problem with respect to a Bregman divergence D_Φ. Given a finite set P{is contained in}R~d of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) =∑min{D_Φ(p, c)}(p∈P, c∈C) is minimized. The Bregman k-median problem plays an important role in many applications, e.g. information theory, statistics, text classification, and speech processing. We give the first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence. Using these coresets, we give a (1 +ε)-approximation algorithm for the Bregman k-median problem with running time O(dkn + d~22~(( k/ε)~(Θ(1)) log~(k+2) n). This result improves over the previousely fastest known (1+ε)-approximation algorithm from [1]. Unlike the analysis of most coreset constructions our analysis does not rely on the construction of ε-nets. Instead, we prove our results by purely combinatorial means.
机译:我们研究了与Bregman发散D_φ的广义K中位问题。给定有限组P {in} R〜D尺寸n,我们的目标是找到大小的集合c,使得错误成本(p,c)=Σmin{d_φ(p,c) }(p∈p,c∈c)被最小化。 Bregman K-Median问题在许多应用中起着重要作用,例如,信息理论,统计,文本分类和语音处理。我们为这一问题提供了第一个Corese施工,对于Bregman分歧的大型子类,包括重要的不相似性措施,如Kullback-Leibler发散和Itakura-Saito发散。使用这些渐变,我们为Regman K-Median问题提供了(1±ε) - 运行时间O的问题O(DKN + D〜22〜((k /ε)〜(θ(1))log〜(k +2)n)。该结果改善了[1]的最快速已知的(1 +ε)克的估计算法。与大多数Coreset结构的分析不同,我们的分析不依赖于ε-net的构造。相反,我们通过纯粹的组合方式证明我们的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号