首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores
【24h】

iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores

机译:iNPG:利用基于NoC的多核的网络内数据包生成加速关键部分访问

获取原文
获取外文期刊封面目录资料

摘要

As recently studied, serialized competition overhead for entering critical section is more dominant than critical section execution itself in limiting performance of multi-threaded shared variable applications on NoC-based many-cores. We illustrate that the invalidation-acknowledgement delay for cache coherency between the home node storing the critical section lock and the cores running competing threads is the leading factor to high competition overhead in lock spinning, which is realized in various spin-lock primitives (such as the ticket lock, ABQL, MCS lock, etc.) and the spinning phase of queue spin-lock (QSL) in advanced operating systems. To reduce such high lock coherence overhead, we propose in-network packet generation (iNPG) to turn passive "normal" NoC routers which only transmit packets into active "big" ones that can generate packets. Instead of performing all coherence maintenance at the home node, big routers which are deployed nearer to competing threads can generate packets to perform early invalidation-acknowledgement for failing threads before their requests reach the home node, shortening the protocol round-trip delay and thus significantly reducing competition overhead in various locking primitives. We evaluate iNPG in Gem5 using PARSEC and SPEC OMP2012 programs with five different locking primitives. Compared to a state-of-the-art technique accelerating critical section access, experimental results show that iNPG can effectively reduce lock coherence overhead, expediting critical section access by 1.35x on average and 2.03x at maximum and consequently improving the program Region-of-Interest (ROI) runtime by 7.8% on average and 14.7% at maximum.
机译:正如最近研究的那样,在限制基于NoC的多核上的多线程共享变量应用程序的性能方面,进入关键部分的序列化竞争开销比关键部分执行本身更具支配力。我们说明,存储关键部分锁的主节点与运行竞争线程的核心之间的缓存一致性的失效确认延迟是锁旋转中高竞争开销的主要因素,这在各种自旋锁原语中实现(例如票证锁,ABQL,MCS锁等)以及高级操作系统中的队列自旋锁(QSL)的旋转阶段。为了减少这种高锁定一致性开销,我们建议使用网络内数据包生成(iNPG)将被动的“正常” NoC路由器转变为仅将数据包传输为可以生成数据包的主动“大”路由器。代替在归属节点上执行所有一致性维护,部署在更接近竞争线程的大型路由器可以生成数据包,以在失败的线程请求到达归属节点之前对其进行早期失效确认,从而缩短了协议往返延迟,从而显着缩短了往返延迟。减少各种锁定原语中的竞争开销。我们使用具有五个不同锁定原语的PARSEC和SPEC OMP2012程序评估Gem5中的iNPG。与加速关键部分访问的最新技术相比,实验结果表明,iNPG可以有效地减少锁定一致性开销,将关键部分访问的速度平均提高了1.35倍,最大提高了2.03倍,从而改善了程序区域-兴趣(ROI)运行时平均提高7.8 \%,最高提高14.7 \%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号