首页> 外文会议>International conference on very large data bases >Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
【24h】

Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience

机译:在地图上构建高级数据流系统 - 减少:猪体验

获取原文

摘要

Increasingly, organizations capture, transform and analyze enormous data sets. Prominent examples include internet companies and e-science. The Map-Reduce scalable dataflow paradigm has become popular for these applications. Its simple, explicit dataflow programming model is favored by some over the traditional high-level declarative approach: SQL. On the other hand, the extreme simplicity of Map-Reduce leads to much low-level hacking to deal with the many-step, branching dataflows that arise in practice. Moreover, users must repeatedly code standard operations such as join by hand. These practices waste time, introduce bugs, harm readability, and impede optimizations. Pig is a high-level dataflow system that aims at a sweet spot between SQL and Map-Reduce. Pig offers SQL-style high-level data manipulation constructs, which can be assembled in an explicit dataflow and interleaved with custom Map- and Reduce-style functions or executables. Pig programs are compiled into sequences of Map-Reduce jobs, and executed in the Hadoop Map-Reduce environment. Both Pig and Hadoop are open-source projects administered by the Apache Software Foundation. This paper describes the challenges we faced in developing Pig, and reports performance comparisons between Pig execution and raw Map-Reduce execution.
机译:越来越多,组织捕获,转换和分析巨大的数据集。突出的例子包括互联网公司和电子科学。地图 - 减少可缩放数据流程范例已成为这些应用程序的流行。它的简单明显的数据流程编程模型是由传统的高级声明性方法:SQL的一些人受到青睐。另一方面,地图的极端简单性导致了很多低级黑客攻击,以处理在实践中出现的多步,分支数据流。此外,用户必须重复用手码标准操作,例如加入。这些实践浪费时间,引入错误,危害可读性,并阻碍优化。猪是一个高级数据流系统,瞄准SQL和地图减少之间的甜蜜点。猪提供SQL式高级数据操作构建体,可以在显式数据流中组装,并使用自定义地图和缩小式功能或可执行文件进行交互。猪计划被编译成MAP-Deforf作业的序列,并在Hadoop地图 - 减少环境中执行。猪和Hadoop都是Apache软件基础管理的开源项目。本文描述了我们在开发猪中面临的挑战,并报告猪执行与原始地图 - 减少执行之间的性能比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号