首页> 外文会议>SIGMOD/PODS >On Synopses for Distinct-Value Estimation Under Multiset Operations
【24h】

On Synopses for Distinct-Value Estimation Under Multiset Operations

机译:关于多重操作下的不同价值估计的概要

获取原文

摘要

The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. We provide DV estimation techniques that are designed for use within a flexible and scalable "synopsis warehouse" architecture. In this setting, incoming data is split into partitions and a synopsis is created for each partition; each synopsis can then be used to quickly estimate the number of DVs in its corresponding partition. By combining and extending a number of results in the literature, we obtain both appropriate synopses and novel DV estimators to use in conjunction with these synopses. Our synopses can be created in parallel, and can then be easily combined to yield synopses and DV estimates for arbitrary unions, intersections or differences of partitions. Our synopses can also handle deletions of individual partition elements. We use the theory of order statistics to show that our DV estimators are unbiased, and to establish moment formulas and sharp error bounds. Based on a novel limit theorem, we can exploit results due to Cohen in order to select synopsis sizes when initially designing the warehouse. Experiments and theory indicate that our synopses and estimators lead to lower computational costs and more accurate DV estimates than previous approaches.
机译:在计算机科学和其他地方的各种环境中估算大型数据集中的不同值(DVS)的数量的任务。我们提供DV估算技术,该技术设计用于灵活且可扩展的“概要仓库”架构中。在此设置中,传入数据被拆分为分区,为每个分区创建概要;然后可以使用每个概要来快速估计其相应分区中的DVS的数量。通过组合和扩展文献中的一些结果,我们获得适当的概要和新颖的DV估计,以与这些概要结合使用。我们的突录部可以并行创建,然后可以轻松地组合以产生任意组合,交叉路口或分区的分区的突出部和DV估计。我们的突网还可以处理单个分区元素的删除。我们使用秩序统计理论,表明我们的DV估计器是无偏见的,并建立时刻公式和急剧误差界限。基于新颖的限制定理,我们可以由于COHEN而利用结果,以便在最初设计仓库时选择概要尺寸。实验与理论表明,我们的突录和估算变低于计算成本,比以前的方法更低,更准确的DV估算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号