【24h】

Mechanisms of Optimizing MapReduce Framework on High Performance Computer

机译:高性能计算机上MapReduce框架的优化机制

获取原文
获取原文并翻译 | 示例

摘要

With the amount of data growing constantly and exponentially, the industry has encountered an unprecedented challenge of efficiently and reliably processing a tremendous amount of data. High performance computer has played a major role in the field of big data processing for its serious computational power and super-large storage. However, it remains some inevitable drawbacks to efficiently utilize the HPC due to its relatively lower availability and usability. We propose to implement MapReduce framework on HPC to solve above problems and extensively expand the application field of HPC. We design a workable plan to deploy Hadoop on HPC with a Lustre file system, and tune Lustre to a better performance based on the nature of data access in Hadoop. Virtual memory disk is proposed to efficiently buffer temporary data and store intermediate data. By taking advantage of high-speed interconnect system of HPC, the intermediate data can be transferred efficiently from map task to reduce task, which cannot be achieved in a Hadoop system on server cluster since the rate of data flow is bounded by the bandwidth of low-speed network, such as Ethernet. The evaluation driven by the standard benchmarks provided in Hadoop package shows that after applying the proposed optimization method, the Hadoop system on HPC gets better performance than Hadoop system on server cluster, especially when handle data-intensive applications.
机译:随着数据量的不断增长和呈指数级增长,该行业已经遇到了前所未有的挑战,即如何有效,可靠地处理大量数据。高性能计算机以其强大的计算能力和超大存储量在大数据处理领域发挥了重要作用。然而,由于其相对较低的可用性和可用性,有效利用HPC仍然存在一些不可避免的缺点。我们建议在HPC上实现MapReduce框架,以解决上述问题,并广泛地扩展HPC的应用领域。我们设计了一个可行的计划,以使用Luster文件系统在HPC上部署Hadoop,并根据Hadoop中数据访问的性质将Luster调整为更好的性能。提出了虚拟存储磁盘以有效地缓冲临时数据并存储中间数据。利用HPC的高速互连系统,可以有效地将中间数据从map任务传输到reduce任务,这在服务器集群上的Hadoop系统中是无法实现的,因为数据流的速率受限于低带宽。高速网络,例如以太网。由Hadoop软件包中提供的标准基准驱动的评估显示,在应用建议的优化方法后,HPC上的Hadoop系统比服务器群集上的Hadoop系统具有更好的性能,尤其是在处理数据密集型应用程序时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号