首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics
【24h】

NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics

机译:NVME-CR:带有NVME过度面料的检查点/重启的可扩展短暂存储运行时

获取原文

摘要

Emerging SSDs with NVMe-over-Fabrics (NVMf) support provide new opportunities to significantly improve the performance of IO-intensive HPC applications. However, state-of-the-art parallel filesystems can not extract the best possible performance from fast NVMe SSDs and are not designed for latency-critical ephemeral IO tasks, such as checkpoint/restart. In this paper, we propose a powerful abstraction called microfs to peel away unnecessary software layers and eliminate namespace coordination. Building upon this abstraction, we present the design of NVMe-CR, a scalable ephemeral storage runtime for clusters with disaggregated compute and storage. NVMe-CR proposes techniques like metadata provenance, log record coalescing, and logically isolated shared device access, built around the microfs abstraction, to reduce the overhead of writing millions of concurrent checkpoint files. NVMe-CR utilizes high-density allflash arrays accessible via NVMf to absorb bursty checkpoint IO and increase the progress rates of applications obliviously. Using the ECP CoMD application as a use case, results show that our runtime can achieve near perfect (> 0.96) efficiency at 448 processes and reduce checkpoint overhead by as much as 2x compared to state-of-the-art storage systems.
机译:具有NVME过度面料(NVMF)支持的新兴SSD提供了新的机会,可以显着提高IO密集型HPC应用的性能。但是,最先进的并行文件系统无法从FAST NVME SSD中提取最佳性能,而不是用于延迟关键的短暂IO任务,例如CheckPoint / Restart。在本文中,我们提出了一种称为Microfs的强大抽象,以剥离不必要的软件层并消除名称空间协调。在此抽象上构建,我们介绍了NVME-CR的设计,可扩展的季节记忆库运行时,用于分解计算和存储。 NVME-CR提出了Microfs抽象的元数据出处,日志记录合并和逻辑分离的共享设备访问等技术,以减少编写数百万并发检查点文件的开销。 NVME-CR通过NVMF使用的高密度ALLFLASH阵列可吸收突发检查点IO,并不知所措地提高应用程序的进度率。使用ECP COMD应用程序作为用例,结果表明,与最先进的存储系统相比,我们的运行时可以在448个进程中实现接近完美(> 0.96)效率,并将检查点开销降低2倍。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号