【24h】

Integrating Pig with Harp to Support Iterative Applications with Fast Cache and Customized Communication

机译:将Pig与Harp集成以支持具有快速缓存和自定义通信的迭代应用程序

获取原文
获取原文并翻译 | 示例

摘要

Use of high-level scripting languages to solve big data problems has become a mainstream approach for sophisticated machine learning data analysis. Often data must be used in several steps of a computation to complete a full task. Composing default data transformation operators with the standard Hadoop MapReduce runtime is very convenient. However, the current strategy of using high-level languages to support iterative applications with Hadoop MapReduce relies on an external wrapper script in other languages such as Python and Groovy, which causes significant performance loss when restarting mappers and reducers between jobs. In this paper, we reduce the extra job startup overheads by integrating Apache Pig with the high-performance Hadoop plug-in Harp developed at Indiana University. This provides fast data caching and customized communication patterns among iterations for data analysis. The results show performance improvements of factors from 2 to 5.
机译:使用高级脚本语言解决大数据问题已成为复杂的机器学习数据分析的主流方法。通常,数据必须在计算的多个步骤中使用才能完成一项完整任务。将默认数据转换运算符与标准Hadoop MapReduce运行时组合在一起非常方便。但是,当前使用高级语言通过Hadoop MapReduce支持迭代应用程序的策略依赖于其他语言(例如Python和Groovy)的外部包装脚本,这会在重新启动作业之间的映射器和reducer时造成严重的性能损失。在本文中,我们将Apache Pig与印第安纳大学开发的高性能Hadoop插件Harp集成在一起,从而减少了额外的作业启动开销。这样可以在迭代之间提供快速的数据缓存和自定义的通信模式,以进行数据分析。结果显示性能从2改善到5。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号