A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

机译：用于多核CMP的快速高效屏障同步的基于G线的网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.

机译：共享内存并行机中的屏障同步已通过繁忙等待共享变量而得到广泛实现。但是，屏障同步的典型实现往往会在内存和网络争用方面产生热点，因此会导致性能瓶颈，随着内核或处理器数量的增加，性能瓶颈会变得更加明显。为了克服这些限制，我们在多核CMP的背景下提出了一种新颖的基于硬件的屏障机制。我们的建议基于全局互连线（G线）和S-CSMA技术，最近已在片上网络的背景下使用它们来增强流控制机制（EVC）。基于此技术，我们设计了一个简单且可扩展的基于G线的网络，该网络独立于主数据网络运行，旨在有效地执行屏障同步。在理想情况下，一旦所有内核或线程都到达屏障，我们的设计只需执行4个周期即可执行屏障同步。作为概念的证明，我们通过与最佳软件方法之一（二进制合并树障碍）进行比较来研究提案的好处。为此，我们在Sim-PowerCMP性能模拟器之上运行了几个内核和科学应用程序，该模拟器可对具有2D网状网络配置的32核CMP进行建模。我们的建议要求将内核和科学应用程序的执行时间平均分别减少68％和21％。此外，网络流量也分别降低了74％和18％。

著录项

来源
《The 39th International Conference on Parallel Processing》|2010年|P.267-276|共10页
会议地点
作者
Abellan Jose L.; Fernandez Juan; Acacio Manuel E.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类并行计算机;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient Hardware Barrier Synchronization in Many-Core CMPs [J] . Abellan Jose L. Parallel and Distributed Systems, IEEE Transactions on . 2012,第8期

机译：多核CMP中的高效硬件屏障同步
2. Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs [J] . Axel Jantsch, Hengzhu Liu, Shuming Chen, IEICE Electronics Express . 2014,第18期

机译：协作通信可在基于网格的多核NoC上实现高效和可扩展的所有屏障同步
3. Efficient barrier synchronization in wormhole- routed mesh networks supporting turn model [J] . Kuo-Pao Fan, Chung-Ta King Parallel Computing . 1998,第14期

机译：支持转弯模型的蠕虫路由网状网络中的有效屏障同步
4. A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs [C] . Abellan Jose L., Fernandez Juan, Acacio Manuel E. International Conference on Parallel Processing . 2010

机译：基于G线的网络，用于许多核心CMPS的快速有效的屏障同步
5. CMPE: Cluster-Management and Power-Efficient protocol for wireless sensor networks [D] . Ho, Shen Ben 2004

机译：CMPE：用于无线传感器网络的群集管理和节能协议
6. Fast Object Tracking on a Many-Core Neural Network Chip [O] . Lei Deng, Zhe Zou, Xin Ma, 2018

机译：多核神经网络芯片上的快速对象跟踪
7. Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism [O] . Xiaowen Chen, Shuming Chen, Zhonghai Lu, 2011

机译：基于网络的多核架构的多FpGa实现，具有快速障碍同步机制

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

摘要

著录项

相似文献

相关主题

期刊订阅