首页> 外文会议>International Conference on Parallel Architectures and Compilation Techniques >Building expressive, area-efficient coherence directories
【24h】

Building expressive, area-efficient coherence directories

机译:建立表达力强,区域效率高的一致性目录

获取原文

摘要

Mainstream chip multiprocessors already include a significant number of cores that make straightforward snooping-based cache coherence less appropriate. Further increase in core count will almost certainly require more sophisticated tracking of data sharing to minimize unnecessary messages and cache snooping. Directory-based coherence has been the standard solution for large-scale shared-memory multiprocessors and is a clear candidate for on-chip coherence maintenance. A vanilla directory design, however, suffers from inefficient use of storage to keep coherence metadata. The result is a high storage overhead for larger scales. Reducing this overhead leads to saving of resources that can be redeployed for other purposes. In this paper, we exploit familiar characteristics of coherence metadata, but with novel angles and propose two practical techniques to increase the expressiveness of directory entries, particularly for chip-multiprocessors. First, it is well known that the vast majority of cache lines have a small number of sharers. We exploit a related fact with a subtle but important difference: that a significant portion of directory entries only need to track one node. We can thus use a hybrid representation of sharers list for the whole set. Second, contiguous memory regions often share the same coherence characteristics and can be tracked by a single entry. We propose a multi-granular mechanism that does not rely on any profiling, compiler, or OS support to identify such regions. Moreover, it allows co-existence of line and region entries in the same locations, thus making regions more applicable. We show that both techniques improve the expressiveness of directory entries, and, when combined, can reduce directory storage by more than an order of magnitude with negligible loss of precision.
机译:主流芯片多处理器已经包括大量内核,这些内核使直接基于侦听的缓存一致性不太合适。核心数量的进一步增加几乎肯定会要求对数据共享进行更复杂的跟踪,以最大程度地减少不必要的消息和缓存监听。基于目录的一致性已成为大型共享内存多处理器的标准解决方案,并且是片上一致性维护的明确候选者。但是,原始目录设计遭受效率低下的存储使用效率的困扰,无法保持连贯的元数据。结果是较大规模的存储开销很大。减少此开销会节省资源,可以将其重新部署用于其他目的。在本文中,我们利用相干元数据的熟悉特性,但是以新颖的角度,提出了两种实用的技术来提高目录条目的表达能力,特别是对于芯片多处理器。首先,众所周知,绝大多数缓存行都有少量的共享者。我们利用一个相关的事实进行了细微但重要的区别:目录条目的很大一部分仅需要跟踪一个节点。因此,我们可以对整个集合使用共享列表的混合表示。其次,连续的存储区域通常具有相同的一致性特征,并且可以通过单个条目进行跟踪。我们提出了一种不依赖任何概要分析,编译器或OS支持来识别此类区域的多粒度机制。而且,它允许线和区域条目在同一位置共存,从而使区域更适用。我们展示了这两种技术都可以提高目录条目的表达能力,并且结合使用时,可以将目录存储减少一个数量级以上,而精度损失则可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号