首页> 外国专利> Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size

Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size

机译:用于在具有未知大小的数据集中生成近似分位数的单通空间高效系统和方法

摘要

A space-efficient system and method for generating an approximate &phgr;-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate &phgr;-quantile is guaranteed to lie within a user-specified approximation error &egr; of the true quantile being sought with a probability of at least 1−&dgr;, with &dgr; being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability &dgr;. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on &dgr; and &egr;), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.
机译:一种节省空间的系统和方法,无需事先了解数据集的大小,即可在一次遍历数据集的过程中生成该数据集的近似ph分位数的数据元素。近似分位数保证在用户指定的近似误差内。真正的分位数至少有1%的可能性。是用户定义的故障概率。 B个缓冲区每个都有k个元素,最初用数据集中的元素填充,b和k的值取决于近似误差e和概率dgr。然后将这些缓冲区折叠到一个输出缓冲区中,然后将其余的缓冲区重新填充元素,折叠(连同之前的输出缓冲区一起),依此类推,直到处理完整个数据集并保留单个输出为止。然后输出对应于近似分位数的元素作为近似分位数。在以后的迭代中(当树的高度至少等于取决于&dgr;和&egr;的预定高度时),将对数据进行非均匀采样以填充缓冲区以提供所需的性能。可以使用并行处理器,将处理器的最终输出缓冲区作为收集处理器P 0 的输入缓冲区发送到收集处理器P 0

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号