首页> 外文会议>7th ACM computing frontiers conference 2010 >On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces
【24h】

On-chip Communication and Synchronization Mechanisms with Cache-Integrated Network Interfaces

机译:具有高速缓存集成网络接口的片上通信和同步机制

获取原文
获取原文并翻译 | 示例

摘要

Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces (NIs), appropriate for scalable multicores, that combine the best of two worlds -the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized NI functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a mechanism for software configurable synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, memory barriers for explicitly-selected accesses of arbitrary size, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and evaluated the on-chip communication performance on the prototype as well as on a CMP simulator with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow paralleliza-tion gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.
机译:每核本地(scratchpad)内存允许直接进行核间通信,与基于一致性的基于缓存的通信相比,具有延迟和能源优势,尤其是当CMP体系结构变得更加分散时。我们设计了适用于可扩展多核的缓存集成网络接口(NI),该接口结合了两个方面的优势-缓存的灵活性和暂存器的效率:片内SRAM在缓存,暂存器和虚拟化之间可配置共享NI功能。本文介绍了我们的体系结构,该体系结构通过RDMA复制提供对单个单词或多单词块的本地和远程暂存器访问。此外,我们介绍了事件响应,作为软件可配置同步原语的一种机制。我们提供了三种事件响应机制,这些机制将NI功能暴露给软件,用于多字传输启动,用于任意大小的显式选择访问的内存屏障以及多方同步队列。我们在一个四核FPGA原型中实现了这些机制,并评估了该原型以及具有多达128个内核的CMP仿真器的片上通信性能。我们展示了高效的同步,低开销的通信和摊销的开销的批量传输,这些传输可为细粒度任务提供并行化收益,并有效利用硬件带宽。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号