首页> 外文会议>IEEE International Congress on Big Data >PortHadoop: Support direct HPC data processing in Hadoop
【24h】

PortHadoop: Support direct HPC data processing in Hadoop

机译:PortHadoop:支持Hadoop中的直接HPC数据处理

获取原文
获取外文期刊封面目录资料

摘要

The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics. In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data. This interest is due to the facts that data movement becomes prohibitively expensive, highperformance data analytic becomes an important part of HPC, and Hadoop-based tools can perform large-scale data processing in a time and budget efficient manner. In this study, we propose PortHadoop, an enhanced Hadoop architecture that enables MapReduce applications reading data directly from HPC parallel file systems (PFS). PortHadoop saves HDFS storage space, and, more importantly, avoids the otherwise costly data copying. PortHadoop keeps all the semantics in the original Hadoop system and PFS. Therefore, Hadoop MapReduce applications can run on PortHadoop without code change except that the input file location is in PFS rather than HDFS. Our experimental results show that PortHadoop can operate effectively and efficiently with the PVFS2 and Ceph file systems.
机译:Hadoop MapReduce编程模型的成功在大数据分析中大大推进了研究。近年来,对高性能计算(HPC)社区的兴趣日益增长,以使用基于Hadoop的工具来处理科学数据。这种兴趣是由于数据移动变得过昂贵的事实,高性能数据分析成为HPC的重要组成部分,基于Hadoop的工具可以一次执行大规模的数据处理和预算有效的方式。在本研究中,我们提出了PortHadoop,这是一个增强的Hadoop架构,使MapReduce应用程序能够直接从HPC并行文件系统(PFS)读取数据。 PortHadoop节省了HDFS存储空间,更重要的是,避免否则昂贵的数据复制。 PortHadoop保留原始Hadoop系统和PFS中的所有语义。因此,Hadoop MapReduce应用程序可以在没有代码更改的情况下在PortHadoop上运行,除了输入文件位置位于PFS而不是HDFS之外。我们的实验结果表明,Porthadoop可以使用PVFS2和Ceph文件系统有效且有效地运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号