【24h】

Finding Heavy Hitters from Lossy or Noisy Data

机译:从有损或嘈杂的数据中找到沉重的打击

获取原文
获取原文并翻译 | 示例

摘要

Motivated by Dvir et al. and Wigderson and Yehudayoff [3,10], we examine the question of discovering the set of heavy hitters of a distribution on strings (i.e., the set of strings with a certain minimum probability) from lossy or noisy samples. While the previous work concentrated on finding both the set of most probable elements and their probabilities, we consider enumeration, the problem of just finding a list that includes all the most probable elements without associated probabilities. Unlike Wigderson and Yehudayoff [10], we do not assume the underlying distribution has small support size, and our time bounds are independent of the support size. For the enumeration problem, we give a polynomial time algorithm for the lossy sample model for any constant erasure probability μ < 1, and a quasi-polynomial algorithm for the noisy sample model for any noise probability v < 1/2 of flipping bits. We extend the lower bound for the number of samples required for the reconstruction problem from [3] to the enumeration problem to show that when μ = 1 - o(l), no polynomial time algorithm exists.
机译:由Dvir等人激发。以及Wigderson和Yehudayoff [3,10],我们研究了一个问题,即从有损或嘈杂的样本中发现字符串分布的重击手集(即,具有一定最小概率的字符串集)。虽然先前的工作集中在查找最可能的元素及其概率的集合上,但我们考虑了枚举,即仅查找一个包含所有最可能的元素但没有关联概率的列表的问题。与Wigderson和Yehudayoff [10]不同,我们不假定基础分布的支持量很小,并且我们的时间范围与支持量无关。对于枚举问题,对于任意恒定擦除概率μ<1,我们给出了有损样本模型的多项式时间算法,对于任何噪声概率v <1/2的翻转位,针对有噪声样本模型给出了拟多项式算法。我们将重构问题所需的样本数下限从[3]扩展到枚举问题,以表明当μ= 1-o(l)时,不存在多项式时间算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号