首页> 外文会议>2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks >Impact of GPUs Parallelism Management on Safety-Critical and HPC Applications Reliability
【24h】

Impact of GPUs Parallelism Management on Safety-Critical and HPC Applications Reliability

机译:GPU并行管理对安全关键和HPC应用程序可靠性的影响

获取原文
获取原文并翻译 | 示例

摘要

Graphics Processing Units (GPUs) offer high computational power but require high scheduling strain to manage parallel processes, which increases the GPU cross section. The results of extensive neutron radiation experiments performed on NVIDIA GPUs confirm this hypothesis. Reducing the application Degree Of Parallelism (DOP) reduces the scheduling strain but also modifies the GPU parallelism management, including memory latency, thread registers number, and the processors occupancy, which influence the sensitivity of the parallel application. An analysis on the overall GPU radiation sensitivity dependence on the code DOP is provided and the most reliable configuration is experimentally detected. Finally, modifying the parallel management affects the GPU cross section but also the code execution time and, thus, the exposure to radiation required to complete computation. The Mean Workload and Executions Between Failures metrics are introduced to evaluate the workload or the number of executions computed correctly by the GPU on a realistic application.
机译:图形处理单元(GPU)具有较高的计算能力,但需要较高的调度压力才能管理并行进程,从而增加了GPU的横截面。在NVIDIA GPU上进行的大量中子辐射实验的结果证实了这一假设。降低应用程序并行度(DOP)可以减少调度压力,但还可以修改GPU并行度管理,包括内存等待时间,线程寄存器数和处理器占用率,这会影响并行应用程序的敏感性。提供了对整个GPU辐射灵敏度与代码DOP的依赖关系的分析,并通过实验检测出最可靠的配置。最后,修改并行管理会影响GPU的横截面,还会影响代码的执行时间,从而影响完成计算所需的辐射。引入了“平均工作负载和两次故障之间的执行”度量标准,以评估实际应用程序上GPU正确计算的工作负载或执行次数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号