首页> 外文期刊>Parallel Computing >Modeling and characterizing GPGPU reliability in the presence of soft errors
【24h】

Modeling and characterizing GPGPU reliability in the presence of soft errors

机译:在存在软错误的情况下建模和表征GPGPU的可靠性

获取原文
获取原文并翻译 | 示例

摘要

The general-purpose computing on graphic processing units (GPGPUs) becomes increasingly popular due to its high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and fault tolerance since they are originally designed for graphics processing. However, the rigorous execution correctness is required for general-purpose applications, which makes reliability a growing concern in the GPGPU architecture design. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated into a single chip are prone to manifest high SER. This paper explores a first step to model and characterize GPGPU reliability in light of soft errors. We develop GPGPU-SODA (GPGPU software Dependability Analysis), a framework to estimate the soft-error vulnerability of GPGPU microarchitecture. By using GPGPU-SODA, we observe that several microarchitecture structures in GPGPUs exhibit high soft-error susceptibility, and the structure vulnerability is sensitive to the workload characteristics (e.g. branch divergences, memory access pattern). We further investigate the impact of several architectural optimizations on GPU soft-error robustness. For example, we find that increasing the number of threads supported by GPU significantly affects the GPGPU soft-error robustness. However, changing the warp scheduling policy has little impact on the structure vulnerability. The observations made in this study provide designers the useful guidance to build resilient GPGPUs: a comprehensive resiliency solution for GPGPUs should consider the entire GPGPU design instead of solely focusing on a particular structure.
机译:图形处理单元(GPGPU)上的通用计算由于对数据并行应用程序的高计算吞吐量而变得越来越流行。由于现代GPU架构最初是为图形处理而设计的,所以其错误检测和容错能力有限。但是,通用应用程序要求严格的执行正确性,这使得可靠性成为GPGPU架构设计中日益关注的问题。随着CMOS处理技术不断缩小到纳米级,片上软错误率(SER)预计将成倍增加。将数百个内核集成到单个芯片中的GPGPU易于表现出较高的SER。本文探讨了根据软错误对GPGPU可靠性进行建模和表征的第一步。我们开发了GPGPU-SODA(GPGPU软件可靠性分析),该框架可以估算GPGPU微体系结构的软错误漏洞。通过使用GPGPU-SODA,我们观察到GPGPU中的几种微体系结构表现出很高的软错误敏感性,并且结构漏洞对工作负载特征(例如分支分歧,内存访问模式)敏感。我们进一步研究了几种体系结构优化对GPU软错误鲁棒性的影响。例如,我们发现增加GPU支持的线程数量会显着影响GPGPU软错误的鲁棒性。但是,更改翘曲调度策略对结构脆弱性影响很小。这项研究中的观察结果为设计师提供了构建弹性GPGPU的有用指导:针对GPGPU的全面弹性解决方案应考虑整个GPGPU设计,而不是仅仅关注特定的结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号