首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning
【24h】

High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning

机译:使用强化学习的高性能,节能,容错片上网络设计

获取原文

摘要

Network-on-Chips (NoCs) are becoming the standard communication fabric for multi-core and system on a chip (SoC) architectures. As technology continues to scale, transistors and wires on the chip are becoming increasingly vulnerable to various fault mechanisms, especially timing errors, resulting in exacerbation of energy efficiency and performance for NoCs. Typical techniques for handling timing errors are reactive in nature, responding to the faults after their occurrence. They rely on error detection/correction techniques which have resulted in excessive power consumption and degraded performance, since the error detection/correction hardware is constantly enabled. On the other hand, indiscriminately disabling error handling hardware can induce more errors and intrusive retransmission traffic. Therefore, the challenge is to balance the trade-offs among error rate, packet retransmission, performance, and energy. In this paper, we propose a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL). First, we propose a new proactive error handling technique comprised of a dynamic scheme for enabling per-router error detection/correction hardware and an effective retransmission mechanism. Second, we propose the use of RL to train the dynamic control policy with the goals of providing increased fault-tolerance, reduced power consumption and improved performance as compared to conventional techniques. Our evaluation indicates that, on average, end-to-end packet latency is lowered by 55%, energy efficiency is improved by 64%, and retransmission caused by faults is reduced by 48% over the reactive error correction techniques.
机译:片上网络(NoC)成为多核和片上系统(SoC)架构的标准通信结构。随着技术的不断扩展,芯片上的晶体管和导线越来越容易受到各种故障机制的影响,尤其是时序错误,从而导致NoC的能效和性能恶化。处理定时错误的典型技术本质上是反应性的,在故障发生后对其做出响应。由于错误检测/纠正硬件一直处于启用状态,因此它们依赖错误检测/纠正技术,这些技术会导致过多的功耗并降低性能。另一方面,不加选择地禁用错误处理硬件会导致更多错误和侵入性的重传流量。因此,挑战是要在错误率,数据包重传,性能和能量之间权衡取舍。在本文中,我们提出了一种主动的容错机制,可以通过强化学习(RL)来优化能源效率和性能。首先,我们提出了一种新的主​​动错误处理技术,该技术包括用于启用每个路由器错误检测/纠正硬件的动态方案和有效的重传机制。其次,我们建议使用RL来训练动态控制策略,其目的是与传统技术相比,提供更高的容错能力,更低的功耗和更高的性能。我们的评估表明,与反应式纠错技术相比,平均而言,端到端的数据包延迟降低了55%,能源效率提高了64%,故障导致的重传减少了48%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号