首页> 外文会议>Annual IEEE/ACM International Symposium on Microarchitecture;IEEE/ACM International Symposium on Microarchitecture >Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
【24h】

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

机译:利用芯片多处理器和快速屏障开发精细的数据并行性

获取原文

摘要

We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsytem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier. We examinetwo types of barrier filters, one synchronizing through instruction cache lines, and the other through data cache lines.
机译:由于CMP具有较低的片上通信延迟,因此我们研究了CMP利用内循环粒度(类似于矢量机通常所针对的)的数据并行性的能力。以这种方式并行化代码会导致出现大量障碍,因此我们探索了不同的障碍机制对这种方法的效率的影响。为了进一步利用CMP进行细粒度数据并行任务的潜力,我们提出了屏障过滤器,一种在芯片多处理器上实现快速屏障同步的机制,以使向量计算能够有效地分布在CMP的各个核心上。我们确保所有到达障碍的线程都需要一条不可用的高速缓存行来进行处理,并且通过在内存子系统的共享部分中放置其他硬件,我们使它们的请求处于饥饿状态,直到它们全部到达为止。具体来说,我们的方法使用无效请求来使高速缓存行不可用,并确定线程何时到达屏障。我们研究了两种类型的屏障过滤器,一种通过指令缓存线进行同步,另一种通过数据缓存线进行同步。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号