...
首页> 外文期刊>Information Sciences: An International Journal >Efficiently processing (p,ε)-approximate join aggregation on massive data
【24h】

Efficiently processing (p,ε)-approximate join aggregation on massive data

机译:有效处理海量数据上的(p,ε)-近似联接聚合

获取原文
获取原文并翻译 | 示例
           

摘要

Join aggregation is an important operation in database systems to return aggregate information on the join of two or several tables. Compared with exact query, it is a better choice in many cases to return approximate result satisfying a user-specified confidence interval in a much faster response time. It is found that none of previous works can efficiently process approximate join aggregation on massive data with arbitrary accuracy. This paper proposes a novel algorithm pε-AJA (p,ε)-Approximate Join Aggregation) to obtain approximate join aggregate result with arbitrary confidence interval efficiently. Two data structures of low space overhead, JRS and JPIPT, are presented in this paper. pε-AJA first makes use of JRS to return a quick response. If the approximate result computed by JRS does not satisfy the given confidence interval, JPIPT is exploited to obtain enough random join tuples. This paper presents a novel sampling algorithm to acquire random JPIPT tuples of specified size and devises its correctness proof. A tuple fetching method is proposed to retrieve join tuples by the sampled JPIPT tuples in one-pass sequential scan on joined tables. The construction and maintenance algorithms of JPIPT and JRS are provided also in this paper. The experimental results show that pε-AJA obtains 3 times to 2 orders of magnitude speedup over the existing algorithms and runs 1 to 4 orders of magnitude faster than exact query.
机译:联接聚合是数据库系统中一项重要的操作,用于在两个或几个表的联接上返回聚合信息。与精确查询相比,在许多情况下以更快的响应时间返回满足用户指定置信区间的近似结果是更好的选择。发现以前的工作都无法以任意精度有效地处理海量数据上的近似联接聚合。提出了一种新的算法pε-AJA(p,ε)-近似连接聚合),可以有效地获得具有任意置信区间的近似连接聚合结果。本文介绍了两种低空间开销的数据结构,即JRS和JPIPT。 pε-AJA首先使用JRS返回快速响应。如果JRS计算的近似结果不满足给定的置信区间,则可以利用JPIPT获得足够的随机连接元组。本文提出了一种新颖的采样算法来获取指定大小的随机JPIPT元组,并设计了其正确性证明。提出了一种元组获取方法,该方法通过对联接表进行一次遍历顺序扫描,通过采样的JPIPT元组来检索联接元组。本文还提供了JPIPT和JRS的构造和维护算法。实验结果表明,pε-AJA的速度是现有算法的3倍至2个数量级,比精确查询快1到4个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号