首页> 外文会议>2011 IEEE 29th International Conference on Computer Design >Improving GPU Robustness by making use of faulty parts
【24h】

Improving GPU Robustness by making use of faulty parts

机译:通过使用有故障的零件来提高GPU的鲁棒性

获取原文

摘要

With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as 16%performance loss. The default method for dealing with faulty SMs is to turn them off. Although faulty SMs cannot be trusted to completely execute a single kernel (program assigned to an SM) correctly, we show that we can still make use of these SMs to improve system throughput by generating and supplying high-level hints to other functional SMs. By making the faulty SMs supply hints to functional SMs, we have been able to achieve an average speed-up of about 16 % over the baseline case (wherein the faulty SMs are turned off). The proposed technique requires minimal hardware overhead and is highly scalable.
机译:在当前最新的图形处理单元(GPU)中有数百个处理单元时,一个或多个处理单元由于永久性故障而在制造或部署后出现故障的可能性急剧增加。在我们的实验中,我们发现8-SM GPU中单个流式多处理器(SM)的损失导致多达16%的性能损失。处理故障SM的默认方法是将其关闭。尽管不能信任有故障的SM正确地完全执行单个内核(分配给SM的程序),但我们表明,我们仍然可以通过生成SM并向其他功能SM提供高级提示来利用这些SM来提高系统吞吐量。通过提供故障SM给功能SM的提示,我们已经能够在基准情况下(故障SM被关闭)实现平均约16%的加速。所提出的技术需要最少的硬件开销并且具有高度可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号