【24h】

G-TSC: Timestamp Based Coherence for GPUs

机译:G-TSC:GPU的基于时间戳的一致性

获取原文

摘要

Cache coherence has been studied extensively in the context of chip multiprocessors (CMP). It is well known that conventional directory-based and snooping coherence protocols generate considerable coherence traffic as the number of hardware thread contexts increase. Since GPUs support hundreds or even thousands of threads, conventional coherence mechanisms when applied to GPUs will exacerbate the the bandwidth constraints that GPUs already face. Recognizing this constraint, prior work has proposed time-based coherence protocols. The main idea is to assign a lease period to the accessed cache block, and after the lease expires the cache block is self-invalidated. However, time-based coherence protocols require global synchronized clocks. Furthermore, this approach may increase execution stalls since threads have to wait to access data with an unexpired lease. Tardis is timestamp-based coherence protocol that has been proposed recently to alleviate the need for global clocks in CPUs. This paper builds on this prior work and proposes G-TSC, a novel cache coherence protocol for GPUs that is based on timestamp ordering. G-TSC conducts its coherence transactions in logical time. This work demonstrates the challenges in adopting timestamp coherence for GPUs which support massive thread parallelism and have unique microarchitecture features. This work then presents a number of solutions that tackle GPU-centric challenges. Evaluation of G-TSC implemented in the GPGPU-Sim simulation framework shows that G-TSC outperforms time-based coherence by 38% with release consistency.
机译:在芯片多处理器(CMP)的上下文中,对缓存一致性进行了广泛的研究。众所周知,随着硬件线程上下文数量的增加,传统的基于目录和监听的一致性协议会产生大量的一致性流量。由于GPU支持数百甚至数千个线程,因此传统的一致性机制在应用于GPU时将加剧GPU已经面临的带宽限制。认识到这一限制,先前的工作提出了基于时间的一致性协议。主要思想是为访问的缓存块分配一个租用期,并且在租用期满后,该缓存块会自失效。但是,基于时间的一致性协议需要全局同步时钟。此外,由于线程必须等待以未到期的租期访问数据,因此该方法可能会增加执行停顿。 Tardis是基于时间戳的一致性协议,最近已提出该协议以减轻对CPU全局时钟的需求。本文在此之前的工作的基础上,提出了G-TSC,这是一种基于时间戳排序的新颖的GPU缓存一致性协议。 G-TSC在逻辑时间内进行其一致性事务。这项工作演示了在采用支持大量线程并行性并具有独特微体系结构功能的GPU的时间戳一致性方面所面临的挑战。然后,这项工作提出了许多解决方案,以解决以GPU为中心的挑战。在GPGPU-Sim仿真框架中实施的G-TSC评估显示,G-TSC在基于发布的一致性方面比基于时间的一致性要高38%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号