Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain Arrays

机译：多线程中的线程间通信，可重新配置的粗粒阵列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional von Neumann GPGPUs only allow threads to communicate through memory on a group-to-group basis. In this model, a group of producer threads writes intermediate values to memory, which are read by a group of consumer threads after a barrier synchronization. To alleviate the memory bandwidth imposed by this method of communication, GPGPUs provide a small scratchpad memory that prevents intermediate values from overloading DRAM bandwidth. In this paper we introduce direct inter-thread communications for massively multithreaded CGRAs, where intermediate values are communicated directly through the compute fabric on a point-to-point basis. This method avoids the need to write values to memory, eliminates the need for a dedicated scratchpad, and avoids workgroup global barriers. We introduce our proposed extensions to the programming model (CUDA) and execution model, as well as the hardware primitives that facilitate the communication. Our simulations of Rodinia benchmarks running on the new system show that direct inter-thread communication provides an average speedup of 2.8x (10.3x max) and reduces system power by an average of 5x (22x max), when compared to an equivalent Nvidia GPGPU.

机译：传统的von neumann gpgpus仅允许线程通过存储器对组进行通信。在此模型中，一组生产者线程将中间值写入存储器，在屏障同步后由一组消费者线程读取。为了缓解这种通信方法所强加的内存带宽，GPGPU提供了一个小的刻痕存储器，可防止中间值过载DRAM带宽。在本文中，我们向大规模多线程CGRA引入直接的线程通信，其中中间值直接通过计算结构在点对点的基础上传通。此方法避免需要将值写入内存，从而消除了对专用暂存器的需求，避免了工作组全局障碍。我们向编程模型（CUDA）和执行模型以及促进通信的硬件原语来介绍我们提出的扩展。我们对新系统运行的rodinia基准测试的模拟表明，直接线程通信提供2.8倍（最大值）的平均速度，并将系统功率降低为5倍（22倍最大），与等效的NVIDIA GPGPU相比。

著录项

来源
《International Symposium on Microarchitecture》|2018年|xxiv 493 p. :|共13页
会议地点
作者
Dani Voitsechov; Oron Port; Yoav Etsion;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
Message systems; Instruction sets; Synchronization; Convolution; Computational modeling; Graphics processing units; Computer architecture;

机译：消息系统;指令集;同步;卷积;计算建模;图形处理单元;计算机架构;

相似文献

外文文献
中文文献
专利

1. Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression [J] . Lee Hochan, Moghaddam Mansureh S., Suh Dongkwan, ACM Transactions on Architecture and Code Optimization . 2018,第1期

机译：通过模数钟表压缩/减压提高粗晶可重构阵列的能效
2. Physical resource binding for a coarse-grain reconfigurable array using evolutionary algorithms [J] . Ma F., Knight J.P., Plett C. IEEE transactions on very large scale integration (VLSI) systems . 2005,第5期

机译：使用进化算法绑定粗粒度可重配置阵列的物理资源
3. Reconfigurable 2×1 CPW-Fed Rectangular Slot Antenna Array (RSAA) Based on Graphene for Wireless Communications [J] . Elsheakh Dalia M., Dardeer Osama M. Applied Computational Electromagnetics Society journal . 2021,第6期

机译：基于Graphene的无线通信可重新配置的2×1 CPW馈线矩形插槽天线阵列（RSAA）
4. Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain Arrays [C] . Dani Voitsechov, Oron Port, Yoav Etsion Annual IEEE/ACM International Symposium on Microarchitecture . 2018

机译：多线程可重构粗粒度阵列中的线程间通信
5. Reconfigurable multithreaded processors for programmable communication systems. [D] . Mamidi, Suman. 2007

机译：用于可编程通信系统的可重配置多线程处理器。
6. 3D Radiation Pattern Reconfigurable Phased Array for Transmission Angle Sensing in 5G Mobile Communication [O] . Jin Zhang, Shuai Zhang, Xianqi Lin, 2018

机译：用于5G移动通信中传输角感应的3D辐射方向图可重构相控阵
7. Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression [O] . Hochan Lee, Mansureh S. Moghaddam, Dongkwan Suh, 2018

机译：通过模数时间表压缩/减压提高粗晶可重构阵列的能效
8. Telecommunications Protocol Processing Subsystem Using Reconfigurable Interoperable Gate Arrays [R] . Pang, Jackson, Pingree, Paula, Torgerson, J. Leigh 2006

机译：使用可重配置互操作门阵列的电信协议处理子系统

Inter-Thread Communication in Multithreaded, Reconfigurable Coarse-Grain Arrays

摘要

著录项

相似文献

相关主题

期刊订阅