首页> 外文会议>IEEE international conference on data engineering >Peeking into the optimization of data flow programs with MapReduce-style UDFs
【24h】

Peeking into the optimization of data flow programs with MapReduce-style UDFs

机译:使用MapReduce风格的UDF窥视数据流程序的优化

获取原文

摘要

Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.
机译:数据流是定义数据密集型处理任务的流行抽象。为了支持广泛的用例,许多数据处理系统都具有MapReduce样式的用户定义函数(UDF)。与关系DBMS中的UDF相比,MapReduce风格的UDF的模板不太严格。这些模板不能单独提供决定是否可以通过关系运算符和其他UDF进行重新排序所需的所有信息。但是,众所周知,对操作符(例如过滤器,联接和聚合)进行重新排序可以使运行时效率提高几个数量级。我们演示了一种针对数据流的优化程序,该优化程序能够使用以命令式语言编写的MapReduce风格的UDF对操作符进行重新排序。我们的方法利用静态代码分析从UDF中提取信息,该信息用于推理UDF运算符的可重排性。该信息足以枚举常规RDBMS优化器所覆盖的搜索空间的很大一部分,包括过滤器和聚合下推,浓密的连接顺序以及基于有趣属性的物理执行策略的选择。我们演示了我们的优化器和一个工作提交客户端,它允许用户逐步进入优化过程的每个阶段:UDF的静态代码分析,重新排序的候选数据流的枚举,物理执行计划的生成以及他们并行执行。为了演示,我们提供了一些关系和非关系数据流程序的选择,这些程序突出了我们方法的显着特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号