首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology
【24h】

Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology

机译:借助纳米光子技术扩展共享内存多核的性能和能效

获取原文
获取原文并翻译 | 示例

摘要

As the number of cores increases exponentially on a single chip, the design and integration of both the on-chip network facilitating intercore communication, and the cache coherence protocol for enabling shared memory programming have become critical for improved energy-efficiency and overall chip performance. With traditional metal interconnects facing stringent energy constraints, researchers are currently pursuing disruptive solutions such as nanophotonics for improved energy-efficiency. Cache coherence in multicores can be enforced effectively by snoopy protocols; however, broadcasting every cache miss can limit the scalability while consuming excess energy. In this paper, we propose PULSE, a nanophotonic broadcast tree-based network for snoopy cache coherent multicores. To limit the energy-penalty from broadcasting (and thereby splitting) optical signals, we direct the optical signal from the external laser such that only the subset of requesters can receive the optical signal. Furthermore, as cache blocks are shared by a few cores, we propose a multicast version of PULSE called multi-PULSE that predicts the sharers' for each L2 miss and morphing the broadcast to a multicast network. We evaluate the energy and performance using CACTI and SIMICS on 16-core and 64-core versions of PULSE and multi-PULSE for Splash-2, PARSEC, and SPEC CPU2006 benchmarks and compare to electrical networks, optical networks, and another cache filtering techniques. Our results indicate that PULSE outperforms competitive electrical/optical networks by 60 percent in terms of execution time, and multi-PULSE reduces average energy from 10 to 80 percent even with a few mispredictions.
机译:随着单个芯片上内核数量的成倍增加,促进内核间通信的片上网络的设计和集成以及用于实现共享内存编程的缓存一致性协议对于提高能效和总体芯片性能变得至关重要。随着传统金属互连面临严格的能源限制,研究人员目前正在寻求破坏性解决方案,例如纳米光子学,以提高能源效率。 snoopy协议可以有效地实施多核中的缓存一致性。但是,广播每个缓存未命中会限制可伸缩性,同时消耗过多的能量。在本文中,我们提出了PULSE,这是一种用于窥探缓存相干多核的基于纳米光子广播树的网络。为了限制因广播(和分割)光信号而造成的能量损失,我们将来自外部激光器的光信号定向为仅请求者的子集可以接收该光信号。此外,由于缓存块由几个内核共享,因此我们提出了一种称为multi-PULSE的PULSE组播版本,该版本可预测每个L2丢失的共享者并将广播变形为组播网络。我们在Splash-2,PARSEC和SPEC CPU2006基准测试的16核和64核版本的PULSE和multi-PULSE上使用CACTI和SIMICS评估能量和性能,并与电气网络,光网络和另一种缓存过滤技术进行比较。我们的结果表明,在执行时间方面,PULSE优于竞争性电气/光学网络60%,而即使有一些错误的预测,multi-PULSE也会将平均能量从10%降低到80%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号