Peeking into the optimization of data flow programs with MapReduce-style UDFs

机译：使用MapReduce风格的UDF窥视数据流程序的优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.

机译：数据流是定义数据密集型处理任务的流行抽象。为了支持广泛的用例，许多数据处理系统都具有MapReduce样式的用户定义函数（UDF）。与关系DBMS中的UDF相比，MapReduce风格的UDF的模板不太严格。这些模板不能单独提供决定是否可以通过关系运算符和其他UDF进行重新排序所需的所有信息。但是，众所周知，对操作符（例如过滤器，联接和聚合）进行重新排序可以使运行时效率提高几个数量级。我们演示了一种针对数据流的优化程序，该优化程序能够使用以命令式语言编写的MapReduce风格的UDF对操作符进行重新排序。我们的方法利用静态代码分析从UDF中提取信息，该信息用于推理UDF运算符的可重排性。该信息足以枚举常规RDBMS优化器所覆盖的搜索空间的很大一部分，包括过滤器和聚合下推，浓密的连接顺序以及基于有趣属性的物理执行策略的选择。我们演示了我们的优化器和一个工作提交客户端，它允许用户逐步进入优化过程的每个阶段：UDF的静态代码分析，重新排序的候选数据流的枚举，物理执行计划的生成以及他们并行执行。为了演示，我们提供了一些关系和非关系数据流程序的选择，这些程序突出了我们方法的显着特征。

著录项

来源
《IEEE international conference on data engineering》|2013年|1292-1295|共4页
会议地点
作者
Hueske Fabian; Peters Mathias; Krettek Aljoscha; Ringwald Matthias;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Implementation of dataflow programming based Fuzzy Logic algorithm for gas concentration index in around of Sidoarjo mudflow, Indonesia [J] . Edita Rosana Widasari, Barlian Henryranu Prasetio, Hurriyatul Fitriyah, MATEC Web of Conferences . 2018,第1期

机译：基于数据流编程的模糊逻辑算法在印度尼西亚西多阿霍泥流附近的瓦斯浓度指标的实现
2. CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems [J] . Fan Zhang, Qutaibah M. Malluhi, Tamer Elsayed, Future generation computer systems . 2015,第octa期

机译：CloudFlow：用于现代HPC系统上的云工作流应用程序的数据感知编程模型
3. SOFA: An extensible logical optimizer for UDF-heavy data flows [J] . Rheinlaender Astrid, Heise Arvid, Hueske Fabian, Information Systems . 2015,第augaasepa期

机译：SOFA：针对UDF繁重的数据流的可扩展逻辑优化器
4. Peeking into the optimization of data flow programs with MapReduce-style UDFs [C] . Hueske Fabian, Peters Mathias, Krettek Aljoscha, IEEE International Conference on Data Engineering . 2013

机译：使用MapReduce-Sique UDF偷看数据流程的优化
5. Path-sensitive, value-flow optimizations of programs. [D] . Bodik, Rastislav. 1999

机译：程序的路径敏感，价值流优化。
6. Optimization of a novel programmable data-flow crypto processor using NSGA-II algorithm [O] . Mahmoud T. El-Hadidi, Hany M. Elsayed, Karim Osama, 2018

机译：使用NSGA-II算法优化新型可编程数据流密码处理器
7. Peeking into the Optimization of Data Flow Programs with MapReduce-style UDFs [O] . Fabian Hueske, Mathias Peters, Aljoscha Krettek, 2013

机译：使用mapReduce样式的UDF窥视数据流程序的优化

Peeking into the optimization of data flow programs with MapReduce-style UDFs

摘要

著录项

相似文献

相关主题

期刊订阅