首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >Characterization of Impact of Transient Faults and Detection of Data Corruption Errors in Large-Scale N-Body Programs Using Graphics Processing Units
【24h】

Characterization of Impact of Transient Faults and Detection of Data Corruption Errors in Large-Scale N-Body Programs Using Graphics Processing Units

机译:使用图形处理单元对大型N体程序中的瞬时故障的影响进行表征并检测数据损坏错误

获取原文

摘要

In N-body programs, trajectories of simulated particles have chaotic patterns if errors are in the initial conditions or occur during some computation steps. It was believed that the global properties (e.g., total energy) of simulated particles are unlikely to be affected by a small number of such errors. In this paper, we present a quantitative analysis of the impact of transient faults in GPU devices on a global property of simulated particles. We experimentally show that a single-bit error in non-control data can change the final total energy of a large-scale N-body program with ~2.1% probability. We also find that the corrupted total energy values have certain biases (e.g., the values are not a normal distribution), which can be used to reduce the expected number of re-executions. In this paper, we also present a data error detection technique for N-body programs by utilizing two types of properties that hold in simulated physical models. The presented technique and an existing redundancy-based technique together cover many data errors (e.g., >97.5%) with a small performance overhead (e.g., 2.3%).
机译:在N体程序中,如果错误是在初始条件下或在某些计算步骤中发生的,则模拟粒子的轨迹将具有混沌模式。据信,模拟粒子的整体性质(例如,总能量)不太可能受到少量此类误差的影响。在本文中,我们对GPU设备中的瞬时故障对模拟粒子的全局属性的影响进行了定量分析。我们通过实验证明,非控制数据中的单位错误会以〜2.1%的概率改变大规模N体程序的最终总能量。我们还发现损坏的总能量值具有某些偏差(例如,这些值不是正态分布),可用于减少预期的重新执行次数。在本文中,我们还通过利用在模拟物理模型中保持的两种类型的属性,提出了一种针对N体程序的数据错误检测技术。提出的技术和现有的基于冗余的技术一起覆盖了许多数据错误(例如,> 97.5%),而具有较小的性能开销(例如,2.3%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号