Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

Glaser Florian; Tagliavini Giuseppe; Rossi Davide; Haugou Germain; Huang Qiuting; Benini Luca

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

【24h】

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

机译：共享-L1-Memory多处理器集群的节能硬件加速同步

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The steeply growing performance demands for highly power- and energy-constrained processing systems such as end-nodes of the Internet-of-Things (IoT) have led to parallel near-threshold computing (NTC), joining the energy-efficiency benefits of low-voltage operation with the performance typical of parallel systems. Shared-L1-memory multiprocessor clusters are a promising architecture, delivering performance in the order of GOPS and over 100 GOPS/W of energy-efficiency. However, this level of computational efficiency can only be reached by maximizing the effective utilization of the processing elements (PEs) available in the clusters. Along with this effort, the optimization of PE-to-PE synchronization and communication is a critical factor for performance. In this article, we describe a light-weight hardware-accelerated synchronization and communication unit (SCU) for tightly-coupled clusters of processors. We detail the architecture, which enables fine-grain per-PE power management, and its integration into an eight-core cluster of RISC-V processors. To validate the effectiveness of the proposed solution, we implemented the eight-core cluster in advanced 22 nm FDX technology and evaluated performance and energy-efficiency with tunable microbenchmarks and a set of real-life applications and kernels. The proposed solution allows synchronization-free regions as small as 42 cycles, over 41x smaller than the baseline implementation based on fast test-and-set access to L1 memory when constraining the microbenchmarks to 10 percent synchronization overhead. When evaluated on the real-life DSP-applications, the proposed SCU improves performance by up to 92 and 23 percent on average and energy efficiency by up to 98 and 39 percent on average.

机译：对互联网的终端节点（IOT）的高功率和能量受限处理系统（物联网）的陡峭增长的性能需求导致了近阈值计算（NTC），加入低的能量效率效益 - 使用典型的并行系统的性能进行操作。 Shared-L1-Memory多处理器集群是一个有前途的架构，以GOPS的顺序提供性能和超过100个GOP / W的能效。然而，只有通过最大化簇中可用的处理元件（PE）的有效利用率，才能达到这种计算效率。除此之外，PE-PE同步和通信的优化是性能的关键因素。在本文中，我们描述了一种重量轻的硬件加速同步和通信单元（SCU），用于紧密耦合的处理器集群。我们详细介绍了架构，可实现精细谷物的Per-PE电源管理，并将其集成到RISC-V处理器的八核集群中。为了验证所提出的解决方案的有效性，我们在高级22 NM FDX技术中实施了八核集群，并通过可调微磁盘和可调节微磁场和一组现实生活应用程序和内核进行了评估性能和节能。所提出的解决方案允许同步区域小于42周期，基于基于基于基线实现的41倍，基于在将微磁发布到10％同步开销时基于快速测试和设置对L1存储器的访问。当在现实生活DSP应用程序上进行评估时，拟议的SCU平均每平均高达92％和23％提高了92％和23％，平均水平高达98％和39％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第3期|633-648|共16页
作者
Glaser Florian; Tagliavini Giuseppe; Rossi Davide; Haugou Germain; Huang Qiuting; Benini Luca;
展开▼
作者单位

Swiss Fed Inst Technol Dept Informat Technol & Elect Engn D ITET Zurich Switzerland;

Univ Bologna Dept Elect Elect & Informat Engn Bologna Italy;

Univ Bologna Dept Elect Elect & Informat Engn Bologna Italy;

Swiss Fed Inst Technol Dept Informat Technol & Elect Engn D ITET Zurich Switzerland;

Swiss Fed Inst Technol Dept Informat Technol & Elect Engn D ITET Zurich Switzerland;

Swiss Fed Inst Technol Dept Informat Technol & Elect Engn D ITET Zurich Switzerland|Univ Bologna Dept Elect Elect & Informat Engn Bologna Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Synchronization; Hardware; Complexity theory; Parallel processing; Kernel; Benchmark testing; Computer architecture; Energy-efficient embedded parallel computing; fine-grain parallelism; tightly memory-coupled multiprocessors;

机译：同步;硬件;复杂性理论;并行处理;内核;基准测试;计算机架构;节能嵌入式并行计算;细颗粒并行;紧密的内存耦合多处理器;

相似文献

外文文献
中文文献
专利

1. Low-Cost and Energy-Efficient Distributed Synchronization for Embedded Multiprocessors [J] . Yu C., Petrov P. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2010,第8期

机译：嵌入式多处理器的低成本，高能效的分布式同步
2. A HARDWARE PLATFORM FOR BARRIER SYNCHRONIZATION IN MULTIPROCESSOR CLUSTERS [J] . T. HINDAM Journal of engineering and applied science . 2003,第6期

机译：多处理器集群中障碍同步的硬件平台
3. Distributed hardwired barrier synchronization for scalable multiprocessor clusters [J] . Shisheng Shang, Kai Hwang IEEE Transactions on Parallel and Distributed Systems . 1995,第6期

机译：适用于可扩展多处理器集群的分布式硬线屏障同步
4. Hardware-Accelerated Energy-Efficient Synchronization and Communication for Ultra-Low-Power Tightly Coupled Clusters [C] . Florian Glaser, Germain Haugou, Davide Rossi, Design, Automation and Test in Europe Conference and Exhibition . 2019

机译：超低功耗紧密耦合集群的硬件加速节能同步和通信
5. Reflective On-chip Resource Management Policies for Energy-efficient Heterogeneous Multiprocessors [D] . Muck, Tiago Rogerio. 2018

机译：节能异构多处理器的反射性片上资源管理策略
6. Energy-Efficient Time Synchronization Based on Nonlinear Clock Skew Tracking for Underwater Acoustic Networks [O] . Di Liu, Min Zhu, Dong Li, 2021

机译：基于非线性时钟偏置跟踪对水下声学网络的节能时间同步
7. Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters [O] . Florian Glaser, Giuseppe Tagliavini, Davide Rossi, 2021

机译：共享-L1-Memory多处理器集群的节能硬件加速同步

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅