...
首页> 外文期刊>Knowledge and information systems >Efficiently processing deterministic approximate aggregation query on massive data
【24h】

Efficiently processing deterministic approximate aggregation query on massive data

机译:有效地处理大规模数据的确定性近似聚合查询

获取原文
获取原文并翻译 | 示例
           

摘要

In actual applications, aggregation is an important operation to return statistical characterizations of subset of the data set. On massive data, approximate aggregation often is preferable for its better timeliness and responsiveness. This paper focuses on deterministic approximate aggregation to return running aggregate within progressive deterministic error interval. The existing methods either return approximate results with fixed accuracy, or return online approximate aggregate with probabilistic confidence interval, or incur a high I/O cost on massive data. This paper proposes LDA algorithm to compute deterministic approximate aggregate on massive data efficiently. LDA utilizes selection attribute lattice of hierarchical structure to distribute tuples and obtain a horizontal partitioning of the table. In each partition, each selection attribute is kept in column file and each ranking attribute is transposed to bit-slices. Given the selection condition, only relevant partitions are involved to compute the running aggregate. The compact storage scheme based on Z-order space filling curve is proposed to reduce the management cost of the partitions. An error reduction method is devised to reduce the error interval when computing running aggregate. The extensive experimental results on synthetic and real data sets show that LDA has a significant performance advantage over the existing algorithms.
机译:在实际应用中,聚合是返回数据集子集的统计特性的重要操作。在大规模数据上,近似聚集通常是优选的,因为其更好的时间和响应性。本文侧重于确定性近似聚合,以逐行逐行确定性错误间隔返回运行聚合。现有方法以固定精度返回近似结果,或者以概率置信区间返回在线近似聚合,或在大规模数据上产生高I / O成本。本文提出了LDA算法,以有效地计算大规模数据的确定性近似聚合。 LDA利用层次结构的选择属性格子分发元组并获得表的水平分区。在每个分区中,每个选择属性都保存在列文件中,每个排序属性都转换为位切片。鉴于选择条件,仅涉及相关分区来计算运行聚合。提出了基于Z订单空间填充曲线的紧凑存储方案来降低分区的管理成本。设计错误缩减方法以减少计算运行聚合时的错误间隔。合成和实数据集的广泛实验结果表明,LDA对现有算法具有显着的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号