首页> 外文会议>ACM SIGMOD International Conference on Management of Data >Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets
【24h】

Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets

机译:用于大型数据集的空间高效在线计算的随机采样技术

获取原文

摘要

In a recent paper [MRL98], we had described a general framework for single pass approximate quantile finding algorithms. This framework included several known algorithms as special cases. We had identified a new algorithm, within the framework, which had a significantly smaller requirement for main memory than other known algorithms. In this paper, we address two issues left open in our earlier paper. First, all known and space efficient algorithms for approximate quantile finding require advance knowledge of the length of the input sequence. Many important database applications employing quantiles cannot provide this information. In this paper, we present a novel non-uniform random sampling scheme and an extension of our framework. Together, they form the basis of a new algorithm which computes approximate quantiles without knowing the input sequence length. Second, if the desired quantile is an extreme value (e.g., within the top 1% of the elements), the space requirements of currently known algorithms are overly pessimistic. We provide a simple algorithm which estimates extreme values using less space than required by the earlier more general technique for computing all quantiles. Our principal observation here is that random sampling is quantifiably better when estimating extreme values than is the case with the median.
机译:在最近的一篇论文中,我们描述了一般框架,用于单通近似分位数发现算法。该框架包括几种已知算法作为特殊情况。我们在框架内确定了一种新的算法,其对主要内存具有明显小于其他已知算法的要求。在本文中,我们解决了在我们早期纸张中留下的两个问题。首先,用于近似分位数查找的所有已知和空间高效算法需要提前了解输入序列的长度。雇用量级的许多重要数据库应用程序都无法提供此信息。在本文中,我们提出了一种新的非均匀随机采样方案和我们框架的延伸。它们一起形成了一种新算法的基础,其在不知道输入序列长度的情况下计算近似定量。其次,如果所需的分位数是极值(例如,在元素的前1%内),则当前已知算法的空间要求过于悲观。我们提供了一种简单的算法,该算法估计使用比早期更通用技术所需的空间的极端值,用于计算所有量级。我们这里的主要观察是,当估计比中位数的情况时,量化的随机抽样量会更好地变得更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号