首页> 外文会议>IEEE International Conference on High Performance Computing and Communications >Mechanisms of Optimizing MapReduce Framework on High Performance Computer
【24h】

Mechanisms of Optimizing MapReduce Framework on High Performance Computer

机译:优化MapReduce框架在高性能计算机上的机制

获取原文

摘要

With the amount of data growing constantly and exponentially, the industry has encountered an unprecedented challenge of efficiently and reliably processing a tremendous amount of data. High performance computer has played a major role in the field of big data processing for its serious computational power and super-large storage. However, it remains some inevitable drawbacks to efficiently utilize the HPC due to its relatively lower availability and usability. We propose to implement MapReduce framework on HPC to solve above problems and extensively expand the application field of HPC. We design a workable plan to deploy Hadoop on HPC with a Lustre file system, and tune Lustre to a better performance based on the nature of data access in Hadoop. Virtual memory disk is proposed to efficiently buffer temporary data and store intermediate data. By taking advantage of high-speed interconnect system of HPC, the intermediate data can be transferred efficiently from map task to reduce task, which cannot be achieved in a Hadoop system on server cluster since the rate of data flow is bounded by the bandwidth of low-speed network, such as Ethernet. The evaluation driven by the standard benchmarks provided in Hadoop package shows that after applying the proposed optimization method, the Hadoop system on HPC gets better performance than Hadoop system on server cluster, especially when handle data-intensive applications.
机译:随着不断和指数的数据增长的数据,业界遇到了有效和可靠地处理巨大数据的前所未有的挑战。高性能计算机在其严重计算能力和超大存储器的大数据处理领域发挥了重要作用。然而,由于其相对较低的可用性和可用性,它仍然是有效利用HPC的一些不可避免的缺点。我们建议在HPC上实现MapReduce框架来解决上述问题,并广泛扩展HPC的应用领域。我们设计了一个可行的计划,可以使用Lustre文件系统部署HPC的Hadoop,并根据Hadoop中的数据访问性质来调整光泽。建议虚拟内存盘以有效地缓冲临时数据并存储中间数据。通过利用HPC的高速互连系统,可以从MAP任务中有效地传输中间数据以减少任务,因为数据流速率被低电平的带宽界定,因此在服务器集群上的Hadoop系统中无法实现。 - 飞行网络,例如以太网。由Hadoop软件包中提供的标准基准驱动的评估显示,在应用所提出的优化方法后,HPC上的Hadoop系统比服务器集群上的Hadoop系统更好,尤其是在处理数据密集型应用时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号