...
首页> 外文期刊>Computer architecture news >SCORPIO: A 36-Core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering
【24h】

SCORPIO: A 36-Core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering

机译:SCORPIO:一种36核研究芯片,可在带网络内订购的可扩展Mesh NoC上展示史努比一致性

获取原文
获取原文并翻译 | 示例
           

摘要

In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs. We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupled from the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered. The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively. The SCORPIO architecture is incorporated in an 11 mm-by-13 mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power Architecture™ cores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers. The chip prototype achieves a post synthesis operating frequency of 1 GHz (833 MHz post-layout) with an estimated power of 28.8 W (768 mW per tile), while the network consumes only 10% of tile area and 19 % of tile power.
机译:在多核时代,可扩展的一致性和片上互连对于共享内存处理器至关重要。虽然snoopy一致性在小型多核系统中很常见,但是基于目录的一致性实际上是对许多内核进行可伸缩性的选择,因为snoopy依赖于不扩展的有序互连。但是,由于过多的目录区域开销或不正确的共享者跟踪,基于目录的一致性不会扩展到超过数十个核心。支持在任意无序网络上排序的现有技术对于完整的多核芯片设计是不切实际的。我们介绍了SCORPIO,这是一种有序网状片上网络(NoC)架构,具有单独的固定延迟,无缓冲网络以实现分布式全局排序。消息传递与顺序分离,允许消息以任何顺序和时间到达,并且仍然可以正确排序。该体系结构旨在与现有的多核IP即插即用,并且具有最大的实用性,时序,面积和功耗。与分布式目录和AMD HyperTransport一致性协议相比,基于SPLASH-2和PARSEC基准测试的全系统36和64核仿真分别显示平均应用程序运行时间减少了24.1%和12.9%。 SCORPIO架构集成在11毫米x 13毫米的芯片原型中,该原型以IBM 45纳米SOI技术制造,包括36个飞思卡尔e200 Power Architecture™内核以及带有私有L1和L2缓存的ARM AMBA与NoC接口,以及两个Cadence片上DDR2控制器。该芯片原型实现了1 GHz的合成后工作频率(布局后为833 MHz),估计功率为28.8 W(每个图块768 mW),而网络仅消耗了10%的图块面积和19%的图块功率。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号