iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores

机译：iNPG：利用基于NoC的多核的网络内数据包生成加速关键部分访问

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As recently studied, serialized competition overhead for entering critical section is more dominant than critical section execution itself in limiting performance of multi-threaded shared variable applications on NoC-based many-cores. We illustrate that the invalidation-acknowledgement delay for cache coherency between the home node storing the critical section lock and the cores running competing threads is the leading factor to high competition overhead in lock spinning, which is realized in various spin-lock primitives (such as the ticket lock, ABQL, MCS lock, etc.) and the spinning phase of queue spin-lock (QSL) in advanced operating systems. To reduce such high lock coherence overhead, we propose in-network packet generation (iNPG) to turn passive "normal" NoC routers which only transmit packets into active "big" ones that can generate packets. Instead of performing all coherence maintenance at the home node, big routers which are deployed nearer to competing threads can generate packets to perform early invalidation-acknowledgement for failing threads before their requests reach the home node, shortening the protocol round-trip delay and thus significantly reducing competition overhead in various locking primitives. We evaluate iNPG in Gem5 using PARSEC and SPEC OMP2012 programs with five different locking primitives. Compared to a state-of-the-art technique accelerating critical section access, experimental results show that iNPG can effectively reduce lock coherence overhead, expediting critical section access by 1.35x on average and 2.03x at maximum and consequently improving the program Region-of-Interest (ROI) runtime by 7.8% on average and 14.7% at maximum.

机译：正如最近研究的那样，在限制基于NoC的多核上的多线程共享变量应用程序的性能方面，进入关键部分的序列化竞争开销比关键部分执行本身更具支配力。我们说明，存储关键部分锁的主节点与运行竞争线程的核心之间的缓存一致性的失效确认延迟是锁旋转中高竞争开销的主要因素，这在各种自旋锁原语中实现（例如票证锁，ABQL，MCS锁等）以及高级操作系统中的队列自旋锁（QSL）的旋转阶段。为了减少这种高锁定一致性开销，我们建议使用网络内数据包生成（iNPG）将被动的“正常” NoC路由器转变为仅将数据包传输为可以生成数据包的主动“大”路由器。代替在归属节点上执行所有一致性维护，部署在更接近竞争线程的大型路由器可以生成数据包，以在失败的线程请求到达归属节点之前对其进行早期失效确认，从而缩短了协议往返延迟，从而显着缩短了往返延迟。减少各种锁定原语中的竞争开销。我们使用具有五个不同锁定原语的PARSEC和SPEC OMP2012程序评估Gem5中的iNPG。与加速关键部分访问的最新技术相比，实验结果表明，iNPG可以有效地减少锁定一致性开销，将关键部分访问的速度平均提高了1.35倍，最大提高了2.03倍，从而改善了程序区域-兴趣（ROI）运行时平均提高7.8 \％，最高提高14.7 \％。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2018年|15-26|共12页
会议地点
作者
Yuan Yao; Zhonghai Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Instruction sets; Spinning; Liquid crystal on silicon; Coherence; Acceleration; Routing protocols;

机译：指令集;旋转;硅上的液晶;相干性;加速度;路由协议;

相似文献

外文文献
中文文献
专利

1. Hard real-time application mapping reconfiguration for NoC-based many-core systems [J] . Pourmohseni Behnaz, Wildermann Stefan, Glass Michael, Real-time systems . 2019,第2期

机译：基于NoC的多核系统的硬实时应用程序映射重新配置
2. Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPs [J] . Charles Subodha, Ahmed Alif, Ogras Umit Y., ACM Transactions on Design Automation of Electronic Systems . 2019,第6期

机译：使用基于NOC的许多核心CMPS中的机器学习有效缓存重新配置
3. A Hierarchical Distributed Runtime Resource Management Scheme for NoC-Based Many-Cores [J] . Tsoutsouras Vasileios, Anagnostopoulos Iraklis, Masouros Dimosthenis, ACM Transactions on Embedded Computing Systems . 2018,第3期

机译：基于NOC的许多核的分层分布式运行时资源管理方案
4. iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores [C] . Yuan Yao, Zhonghai Lu IEEE International Symposium on High Performance Computer Architecture . 2018

机译：INPG：加速基于NOC的网络中的网络数据包生成的关键部分访问
5. On the Merits of Deploying TDM-based Next-Generation PON Solutions in the Access Arena As Multiservice, All Packet-Based 4G Mobile Backhaul RAN Architecture. [D] . Zaidi, Syed Rashid Nafees. 2014

机译：关于在Access Arena中将基于TDM的下一代PON解决方案作为多服务，所有基于分组的4G移动回程RAN架构进行部署的优点。
6. Microribbon-based hydrogels accelerate stem cell-based bone regeneration in a mouse critical-size cranial defect model [O] . Li-Hsin Han, Bogdan Conrad, Michael T. Chung, -1

机译：基于微带的水凝胶可在小鼠关键尺寸颅骨缺损模型中加速基于干细胞的骨再生
7. Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPs [O] . Subodha Charles, Alif Ahmed, Umit Y. Ogras, 2019

机译：使用基于NOC的许多核心CMPS中的机器学习有效缓存重新配置

iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅