首页> 外文期刊>Cloud Computing, IEEE Transactions on >FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments
【24h】

FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments

机译:FastRAQ:大数据环境中范围聚合查询的快速方法

获取原文
获取原文并翻译 | 示例
       

摘要

Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ—a fast approach to range-aggregate queries in big data environments. FastRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, FastRAQ obtains the result directly by summarizing local estimates from all partitions. FastRAQ has time complexity for data updates and time complexity for range-aggregate queries, where is the number of distinct tuples for all dimensions, is the partition number, and is the bucket number in the histogram. We implement the FastRAQ approach on the Linux platform, and evaluate its performance with about 10 billions data records. Experimental results demonstrate that FastRAQ provides range-aggregate query results within a time per- od two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval.
机译:范围聚合查询将对给定查询范围内的所有元组应用某个聚合函数。现有的范围汇总查询方法不足以在大数据环境中快速提供准确的结果。在本文中,我们提出了FastRAQ-一种在大数据环境中进行范围汇总查询的快速方法。 FastRAQ首先使用平衡分区算法将大数据划分为独立的分区,然后为每个分区生成局部估计草图。当范围汇总查询请求到达时,FastRAQ通过汇总来自所有分区的本地估计值直接获得结果。 FastRAQ具有数据更新的时间复杂度和范围聚合查询的时间复杂度,其中是所有维度的不同元组数,分区数和直方图中的存储区数。我们在Linux平台上实施FastRAQ方法,并通过大约100亿条数据记录评估其性能。实验结果表明,FastRAQ在比Hive低两个数量级的时间内提供了范围汇总查询结果,而在给定的置信区间内,相对误差小于3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号