【24h】

FPGA Checkpointing for Scientific Computing

机译:科学计算的FPGA检查点

获取原文

摘要

The use of FPGAs in computational workloads is becoming increasingly popular due to the flexibility of these devices in comparison to ASICs, and their low power consumption compared to GPUs and CPUs. However, scientific applications run for long periods of time and the hardware is always subject to failures due to either soft or hard errors. Thus, it is important to protect these long running jobs with fault tolerance mechanisms. Checkpoint-Restart is a popular technique in high-performance computing that allows large scale applications to cope with frequent failures. In this work we approach the fault tolerance of CPU-FPGA heterogeneous applications from a high level by using OmpSs@FPGA environment and a multi-level checkpointing library. We analyse the performance of several different applications and we understand what kind of overheads we can expect from checkpointing computational workloads running on FPGAs. Our results demonstrate overheads as low as 0.16% and 0.66% when checkpointing very frequently, indicating that this technique is efficient and does not add a significant amount of overhead to the system. In addition, we showcase a proof of concept for checkpointing partial data of the FPGA task itself. This can prove useful for workloads in which most data is offloaded to the FPGA memory at once and do not constantly move all the data between the accelerator and the CPU.
机译:由于与ASIC相比,由于这些设备的灵活性,使用FPGA在计算工作负载中的使用变得越来越受欢迎,并且与GPU和CPU相比,它们的低功耗和它们的低功耗。但是,科学应用程序长时间运行,硬件由于柔软或艰难的错误而始终会发生故障。因此,重要的是要使用容错机制保护这些长时间运行的工作。检查点重启是一种在高性能计算中的流行技术,允许大规模应用来应对频繁的故障。在这项工作中,我们通过使用OMPSS @ FPGA环境和多级检查点库从高级接近CPU-FPGA异构应用的容错。我们分析了几种不同应用程序的性能,我们了解我们可以期望在FPGA上运行的计算工作负载来看的哪些开销。我们的结果表明,当经常检查点,表明,当经常检查时,表现出低至0.16%和0.66%,表明该技术是有效的并且不会增加系统的大量开销。此外,我们展示了概念证明,用于检查FPGA任务本身的部分数据。这可以证明对大多数数据一次卸载到FPGA存储器的工作负载有用,并且不会不断地移动加速器和CPU之间的所有数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号