首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Bit-Oriented Sampling for Aggregation on Big Data
【24h】

Bit-Oriented Sampling for Aggregation on Big Data

机译:面向位对大数据的聚合的采样

获取原文
获取原文并翻译 | 示例

摘要

The efficiency of big data analysis has become a bottleneck. Aggregation is a fundamental analytical task. It usually consumes a lot of time so that sampling based aggregation is often used to improve response time at a loss of result accuracy. In all of the related works, sampling is conducted at the granularity of data item. Considering the bits at different bit positions of each data item have different contributions to an aggregation result, the performance of sampling based aggregation has a chance of being improved if sampling is conducted at the granularity of bit. Thus, this paper studies bit-oriented sampling for aggregation. Two methods of bit-oriented uniform sampling based aggregation, i.e., DVBM and DVFM, are proposed which are based on the central limit theorem or the Chebyshev's inequality. They are much more efficient than the methods of the traditional data-oriented uniform sampling based aggregation. DVBM can guarantee a given error bound of aggregation with the assumption that sample variance equals dataset variance. By contrast, DVFM achieves the same goal without that assumption, but it could result in a larger sampling size. Extensive experiments are carried out and the results show that DVBM and DVFM are both efficient and effective.
机译:大数据分析的效率已成为瓶颈。聚合是一个基本的分析任务。它通常会消耗大量时间,以便基于采样的聚合通常用于改善结果准确性损失的响应时间。在所有相关的工作中,采样在数据项的粒度下进行。考虑到每个数据项的不同比特位置的比特对聚合结果具有不同的贡献,基于采样的聚合的性能如果在位的粒度下进行采样,则基于采样的聚合的性能具有改善。因此,本文研究了面向位的采样进行聚合。提出了基于位均匀的基于采样的聚合,即DVBM和DVFM的两种方法,其基于中央极限定理或Chebyshev的不等式。它们比传统数据导向的统一采样的聚合的方法更有效。 DVBM可以保证带有样本方差等于DataSet方差的假设的聚合的给定错误。相比之下,DVFM在没有这种假设的情况下实现了相同的目标,但它可能导致更大的采样大小。进行了广泛的实验,结果表明DVBM和DVFM既有效又有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号