Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

机译：利用芯片多处理器和快速屏障开发精细的数据并行性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsytem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier. We examinetwo types of barrier filters, one synchronizing through instruction cache lines, and the other through data cache lines.

机译：由于CMP具有较低的片上通信延迟，因此我们研究了CMP利用内循环粒度（类似于矢量机通常所针对的）的数据并行性的能力。以这种方式并行化代码会导致出现大量障碍，因此我们探索了不同的障碍机制对这种方法的效率的影响。为了进一步利用CMP进行细粒度数据并行任务的潜力，我们提出了屏障过滤器，一种在芯片多处理器上实现快速屏障同步的机制，以使向量计算能够有效地分布在CMP的各个核心上。我们确保所有到达障碍的线程都需要一条不可用的高速缓存行来进行处理，并且通过在内存子系统的共享部分中放置其他硬件，我们使它们的请求处于饥饿状态，直到它们全部到达为止。具体来说，我们的方法使用无效请求来使高速缓存行不可用，并确定线程何时到达屏障。我们研究了两种类型的屏障过滤器，一种通过指令缓存线进行同步，另一种通过数据缓存线进行同步。

著录项

来源
《Annual IEEE/ACM International Symposium on Microarchitecture;IEEE/ACM International Symposium on Microarchitecture》|2006年|P.235-246|共12页
会议地点
作者
Jack Sampson; Ruben Gonzalez; Jean-Francois Collard; Norman P. Jouppi; Mike Schlansker; Brad Calder; PJack Sampson; PRuben Gonzalez; PJean-Francois Collard; PMike Schlansker; PBrad Calder;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构;
关键词

相似文献

外文文献
中文文献
专利

1. Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors [J] . Sanjeev Kumar, Christopher J. Hughes, Anthony Nguyen Computer architecture news . 2007,第2期

机译：Carbon：片上多处理器并行处理的架构支持
2. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor [J] . Michael Gschwind International journal of parallel programming . 2007,第3期

机译：单元宽带引擎：在芯片多处理器中开发多个并行级别
3. Exploiting database parallelism in a message-passing multiprocessor [J] . IBM Journal of Research and Development . 1991,第5期

机译：在消息传递多处理器中利用数据库并行性
4. Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors [C] . G. Chen, M. Kandemir, I. Kolcu, European Conference on Parallel Computing . 2003

机译：利用片上数据传输，以提高芯片级多处理器的性能
5. Support for dynamic management of parallelism in chip multiprocessors . [D] . Contreras Salas, Gilberto. 2008

机译：支持芯片多处理器的并行性动态管理。
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors [O] . G. Chen, I. Kolcu, A. Choudhary 2008

机译：利用片上数据传输提高芯片级多处理器的性能
8. Some Language Issues in High Performance Computing: Translation from Fine-grained Parallelism to Coarse-grained Parallelism [R] . Goudy, S. 2006

机译：高性能计算中的一些语言问题：从细粒度并行到粗粒度并行的转换

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

摘要

著录项

相似文献

相关主题

期刊订阅