首页> 外文期刊>Parallel Computing >Accelerating big data analytics on HPC clusters using two-level storage
【24h】

Accelerating big data analytics on HPC clusters using two-level storage

机译:使用两级存储加速HPC集群上的大数据分析

获取原文
获取原文并翻译 | 示例

摘要

Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters. (C) 2016 Elsevier B.V. All rights reserved.
机译:固有地受I / O约束的数据密集型应用程序已成为传统高性能计算(HPC)群集上的主要工作量。简单地使用数据密集型计算存储设备(例如HDFS)或使用HPC群集上可用的并行文件系统来为此类应用程序服务会导致性能和可伸缩性问题。在本文中,我们提出了一种新颖的两级存储系统,该系统将高级别的内存文件系统与低级别的并行文件系统集成在一起。前者提供了高存储速度的I / O性能,而后者则提供了大容量的一致存储。我们使用Tachyon和OrangeFS构建了一个两级存储系统原型,并分析了典型MapReduce操作的最终I / O吞吐量。理论建模和实验表明,与HDFS和OrangeFS相比,拟议的二级存储提供了更高的聚合I / O吞吐量,并在读写方面实现了可扩展的性能。我们希望这种两级存储方法能够提供有关HPC群集上大数据分析的系统设计的见解。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号