【24h】

Extending Complex Ad-Hoc OLAP

机译:扩展复杂的Ad-Hoc OLAP

获取原文

摘要

Large scale data analysis and mining activities require sophisticated information extraction queries. Many queries require complex aggregation, and many of these aggregates are non-distributive. Conventional solutions to this problem involve defining User Defined Aggregate Functions (UDAFs). However, the use of UDAFs entails several problems. Defining a new UDAF can be a significant burden for the user, and optimizing queries involving UDAFs is difficult because of the "black box" nature of the UDAF. In this paper, we present a method for expressing, nested aggregates in a declarative way. A nested aggregate, which is a rollup of another aggregated value, expresses a wide range of useful non-distributive aggregation. For example, most frequent type aggregation can be naturally expressed using nested aggregation, e.g. "For each product, report its total sales during the month with the largest total sales of the product". By expressing complex aggregates declaratively, we relieve the user of the burden of defining UDAFs, and allow the evaluation of the complex aggregates to be optimized. We use the Extended Multi-Feature (EMF) syntax as the basis for expressing nested aggregation. An advantage of this approach is that EMF SQL can already express a a wide range of complex aggregation in a succinct way, and EMF SQL is easily optimized into efficient query plans. We show that nested aggregation queries can be evaluated efficiently by using a small extension to the EMF SQL query evaluation algorithm. A side effect of this extension is to extend EMF SQL to permit complex aggregation of data from multiple sources.
机译:大规模数据分析和采矿活动需要复杂的信息提取查询。许多查询需要复杂的聚合,并且许多这些聚合是非分配的。对此问题的传统解决方案涉及定义用户定义的聚合函数(UDAF)。但是,使用UDAFS需要几个问题。定义新的UDAF对于用户来说可能是一个重大负担,并且由于UDAF的“黑匣子”性质,涉及UDAFS的优化查询很难。在本文中,我们介绍了一种以声明方式表达嵌套聚集体的方法。嵌套聚合是另一个聚合值的汇总,表达了广泛的有用非分配聚合。例如,可以使用嵌套聚合自然地表达最频繁的类型聚合,例如, “对于每种产品,在该月的全部销售额最大的月份报告其总销量”。通过声明地表达复杂的聚合,我们缓解了用户定义UDAF的负担,并允许评估复杂的聚合进行优化。我们使用扩展的多特征(EMF)语法作为表达嵌套聚合的基础。这种方法的一个优点是EMF SQL可以以简洁的方式表达广泛的复杂聚合,并且EMF SQL很容易优化为有效的查询计划。我们表明可以通过使用对EMF SQL查询评估算法的小扩展来有效地评估嵌套聚合查询。该扩展的副作用是扩展EMF SQL以允许复合来自多个来源的数据聚合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号