首页> 外文期刊>Concurrency and computation: practice and experience >An investigation of the effects of hard and soft errors on graphics processing unit-accelerated molecular dynamics simulations
【24h】

An investigation of the effects of hard and soft errors on graphics processing unit-accelerated molecular dynamics simulations

机译:硬错误和软错误对图形处理单元加速分子动力学模拟的影响的研究

获取原文
获取原文并翻译 | 示例
           

摘要

Molecular dynamics (MD) simulations rely on the accurate evaluation and integration of Newton's equations of motion to propagate the positions of atoms in proteins during a simulation. As such, one can expect them to be sensitive to any form of numerical error that may occur during a simulation. Increasingly graphics processing units (GPUs) are being used to accelerate MD simulations. Current GPU architectures designed for high performance computing applications support error-correcting codes (ECC) that detect and correct single bit-flip soft error events in GPU memory; however, this error checking carries a penalty in terms of simulation speed. ECC is also a major distinguishing feature between high performance computing NVIDIA Tesla cards and the considerably more cost-effective NVIDIA GeForce gaming cards. An argument often put forward for not using GeForce cards is that the results are unreliable because of the lack of ECC. In an initial attempt to quantify these concerns, an investigation of the reproducibility of GPU-accelerated MD simulations using the AMBER software was conducted on the XSEDE supercomputer Keeneland, a cluster at Los Alamos National Laboratory, and a cluster at the San Diego Supercomputer Center. While the data collected are insufficient to make solid conclusions and more extensive testing is needed to provide quantitative statistics, the absence of ECC events and lack of any silent errors in all the simulations conducted to date suggest that these errors are exceedingly rare and as such the time and memory penalty of ECC may outweigh the utility of error checking functionality. However, a considerable amount of error originating from defective hardware was observed, which suggests that rigorous acceptance testing should be performed on new GPU-based systems by repeatedly running reproducible yet realistic calculations.
机译:分子动力学(MD)模拟依赖于牛顿运动方程的精确评估和积分,以在模拟过程中传播蛋白质中原子的位置。这样,可以期望它们对模拟过程中可能发生的任何形式的数值误差敏感。图形处理单元(GPU)越来越多地用于加速MD仿真。当前为高性能计算应用设计的GPU架构支持纠错码(ECC),用于检测和纠正GPU内存中的单个位翻转软错误事件。但是,这种错误检查会在仿真速度方面带来损失。 ECC还是高性能计算NVIDIA Tesla卡和成本效益更高的NVIDIA GeForce游戏卡之间的主要区别。经常有人提出不使用GeForce卡的论点是,由于缺少ECC,结果不可靠。为了量化这些担忧,在XSEDE超级计算机Keeneland,Los Alamos国家实验室的集群和圣地亚哥超级计算机中心的集群上,使用AMBER软件对GPU加速的MD仿真的可重复性进行了研究。尽管收集到的数据不足以得出可靠的结论,还需要进行更广泛的测试以提供定量统计数据,但迄今为止进行的所有模拟中都没有ECC事件且没有任何静默错误,这表明这些错误极为罕见,因此, ECC的时间和内存损失可能超过错误检查功能的实用性。但是,观察到大量源自有缺陷的硬件的错误,这表明应该通过重复运行可再现但现实的计算,对基于新GPU的系统进行严格的验收测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号