首页> 外文会议>The 39th International Conference on Parallel Processing >A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs
【24h】

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

机译:用于多核CMP的快速高效屏障同步的基于G线的网络

获取原文

摘要

Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.
机译:共享内存并行机中的屏障同步已通过繁忙等待共享变量而得到广泛实现。但是,屏障同步的典型实现往往会在内存和网络争用方面产生热点,因此会导致性能瓶颈,随着内核或处理器数量的增加,性能瓶颈会变得更加明显。为了克服这些限制,我们在多核CMP的背景下提出了一种新颖的基于硬件的屏障机制。我们的建议基于全局互连线(G线)和S-CSMA技术,最近已在片上网络的背景下使用它们来增强流控制机制(EVC)。基于此技术,我们设计了一个简单且可扩展的基于G线的网络,该网络独立于主数据网络运行,旨在有效地执行屏障同步。在理想情况下,一旦所有内核或线程都到达屏障,我们的设计只需执行4个周期即可执行屏障同步。作为概念的证明,我们通过与最佳软件方法之一(二进制合并树障碍)进行比较来研究提案的好处。为此,我们在Sim-PowerCMP性能模拟器之上运行了几个内核和科学应用程序,该模拟器可对具有2D网状网络配置的32核CMP进行建模。我们的建议要求将内核和科学应用程序的执行时间平均分别减少68%和21%。此外,网络流量也分别降低了74%和18%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号