Efficiently processing (p,ε)-approximate join aggregation on massive data

Xixian Han; Jianzhong Li; Hong Gao

首页> 外文期刊>Information Sciences: An International Journal >Efficiently processing (p,ε)-approximate join aggregation on massive data

【24h】

Efficiently processing (p,ε)-approximate join aggregation on massive data

机译：有效处理海量数据上的（p，ε）-近似联接聚合

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Join aggregation is an important operation in database systems to return aggregate information on the join of two or several tables. Compared with exact query, it is a better choice in many cases to return approximate result satisfying a user-specified confidence interval in a much faster response time. It is found that none of previous works can efficiently process approximate join aggregation on massive data with arbitrary accuracy. This paper proposes a novel algorithm pε-AJA (p,ε)-Approximate Join Aggregation) to obtain approximate join aggregate result with arbitrary confidence interval efficiently. Two data structures of low space overhead, JRS and JPIPT, are presented in this paper. pε-AJA first makes use of JRS to return a quick response. If the approximate result computed by JRS does not satisfy the given confidence interval, JPIPT is exploited to obtain enough random join tuples. This paper presents a novel sampling algorithm to acquire random JPIPT tuples of specified size and devises its correctness proof. A tuple fetching method is proposed to retrieve join tuples by the sampled JPIPT tuples in one-pass sequential scan on joined tables. The construction and maintenance algorithms of JPIPT and JRS are provided also in this paper. The experimental results show that pε-AJA obtains 3 times to 2 orders of magnitude speedup over the existing algorithms and runs 1 to 4 orders of magnitude faster than exact query.

机译：联接聚合是数据库系统中一项重要的操作，用于在两个或几个表的联接上返回聚合信息。与精确查询相比，在许多情况下以更快的响应时间返回满足用户指定置信区间的近似结果是更好的选择。发现以前的工作都无法以任意精度有效地处理海量数据上的近似联接聚合。提出了一种新的算法pε-AJA（p，ε）-近似连接聚合），可以有效地获得具有任意置信区间的近似连接聚合结果。本文介绍了两种低空间开销的数据结构，即JRS和JPIPT。 pε-AJA首先使用JRS返回快速响应。如果JRS计算的近似结果不满足给定的置信区间，则可以利用JPIPT获得足够的随机连接元组。本文提出了一种新颖的采样算法来获取指定大小的随机JPIPT元组，并设计了其正确性证明。提出了一种元组获取方法，该方法通过对联接表进行一次遍历顺序扫描，通过采样的JPIPT元组来检索联接元组。本文还提供了JPIPT和JRS的构造和维护算法。实验结果表明，pε-AJA的速度是现有算法的3倍至2个数量级，比精确查询快1到4个数量级。

著录项

来源
《Information Sciences: An International Journal》 |2014年第null期|共20页
作者
Xixian Han; Jianzhong Li; Hong Gao;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;
关键词
Massive data; Approximate join aggregation; JRS; JPIPT;

机译：海量数据;近似联接聚合;JRS;JPIPT;

相似文献

外文文献
中文文献
专利

1. Efficiently processing (p,ε)-approximate join aggregation on massive data [J] . Xixian Han, Jianzhong Li, Hong Gao Information Sciences: An International Journal . 2014,第Null期

机译：有效处理海量数据上的（p，ε）-近似联接聚合
2. PI-Join: Efficiently processing join queries on massive data [J] . Xixian Han, Jianzhong Li, Donghua Yang Knowledge and information systems . 2012,第3期

机译：PI-Join：有效处理海量数据上的联接查询
3. PI-Join: Efficiently processing join queries on massive data [J] . Xixian Han, Jianzhong Li, Donghua Yang Knowledge and Information Systems . 2012,第3期

机译：PI-Join：有效处理海量数据上的联接查询
4. Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce [C] . Ge Song, Rochas Justine, Huet Fabrice, Euromicro International Conference on Parallel, Distributed and Network-Based Processing . 2015

机译：处理MapReduce上海量数据的K个最近邻居联接的解决方案
5. Efficient range and join query processing in massively distributed peer-to-peer networks. [D] . Wang, Qiang. 2008

机译：大规模分布的对等网络中的有效范围和联接查询处理。
6. Bacterial Ligase D preternary-precatalytic complex performs efficient abasic sites processing at double strand breaks during nonhomologous end joining [O] . Ana de Ory, Claudia Carabaña, Miguel de Vega 2019

机译：细菌连接酶D的前-预催化复合物在非同源末端连接过程中在双链断裂处执行有效的脱碱基位点处理
7. GreedyDual-Join -- Locality-Aware Buffer Management for Approximate Join Processing over Data Streams [O] . Feifei Li, Ching Chang, Azer Bestavros, 2004

机译：GreedyDual-Join - 用于数据流上的近似连接处理的位置感知缓冲区管理

Efficiently processing (p,ε)-approximate join aggregation on massive data

摘要

著录项

相似文献

相关主题

期刊订阅