首页> 外文会议>International conference on parallel architectures and compilation techniques >XPoint Cache: Scaling Existing Bus-Based Coherence Protocols for 2D and 3D Many-Core Systems
【24h】

XPoint Cache: Scaling Existing Bus-Based Coherence Protocols for 2D and 3D Many-Core Systems

机译:XPoint缓存:扩展2D和3D多核系统的现有基于总线的一致性协议

获取原文

摘要

With multi-core processors now mainstream, the shift to many-core processors poses a new set of design challenges. In particular, the scalability of coherence protocols remains a significant challenge. While complex Network-on-Chip interconnect fabrics have been proposed and in some cases implemented, most of industry has slowly evolved existing coherence solutions to meet the needs of a glowing number of cores. Industries' slow adoption of Network-on-Chip designs is in large part due to the significant effort needed to design and verify the system. However, simply scaling bus-based coherence is not straightforward either because of increased contention and latency on the bus for large core counts. This paper proposes a new architecture, Xpoint, which does not need to modify existing bus-based snooping coherence protocols to scale to 64 core systems. Xpoint employs interleaved cache structures with detailed floorplaning and system analysis to reduce contention at high core counts. Results show that the Xpoint system achieves, on average, a 28× and 35× speedup over a single core design on the Splash2 benchmarks for a 32 and 64 core system respectively (a 1.6× improvement over a 64 core conventional bus). Xpoint is also evaluated as a 3D stacked system to reduce further bus latency. Results show a 29x and 45× speedup for 32 and 64 core systems respectively (a 2.1× improvement over a 64 core conventional bus and within 8% of the speedup of a 64 core system with an ideal interconnect). Measurements also show that the Xpoint system decreases bus contention of a 64 core system to only 13% higher than that of an 8-core design (a 29× improvement over a 64 core conventional bus).
机译:随着现在多核处理器成为主流,向多核处理器的转变带来了一系列新的设计挑战。特别是,一致性协议的可伸缩性仍然是一个重大挑战。尽管已经提出并在某些情况下实施了复杂的片上网络互连结构,但大多数行业已在缓慢发展现有的一致性解决方案,以满足大量内核的需求。工业界对片上网络设计的缓慢采用很大程度上是由于设计和验证系统所需的大量工作。但是,简单地缩放基于总线的一致性并不是一件容易的事,这是因为对于大量内核而言,总线上的争用和延迟增加了。本文提出了一种新的架构Xpoint,它不需要修改现有的基于总线的侦听一致性协议即可扩展到64个核心系统。 Xpoint采用交错的缓存结构以及详细的布局和系统分析,以减少核心数量较多时的争用。结果表明,在32和64核系统的Splash2基准测试中,Xpoint系统平均比单核设计分别提高了28倍和35倍(与64核常规总线相比提高了1.6倍)。 Xpoint还被评估为3D堆叠系统,以减少进一步的总线延迟。结果显示,32和64核心系统的速度分别提高了29倍和45倍(与64核心常规总线相比提高了2.1倍,与具有理想互连的64核心系统的速度相比提高了8%)。测量还显示,Xpoint系统将64核系统的总线争用减少到比8核设计的总线争用仅高13%(比64核传统总线提高29倍)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号