首页> 外文会议>Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture >In-Network Coherence Filtering: Snoopy coherence without broadcasts
【24h】

In-Network Coherence Filtering: Snoopy coherence without broadcasts

机译:网络内一致性过滤:无广播的史努比一致性

获取原文

摘要

With transistor miniaturization leading to an abundance of on-chip resources and uniprocessor designs providing diminishing returns, the industry has moved beyond single-core microprocessors and embraced the many-core wave. Scalable cache coherence protocol implementations are necessary to allow fast sharing of data among various cores and drive the many-core revolution forward. Snoopy coherence protocols, if realizable, have the desirable property of having low storage overhead and not adding indirection delay to cache-to-cache accesses. There are various proposals, like Token Coherence (TokenB), Uncorq, Intel QPI, INSO and Timestamp Snooping, that tackle the ordering of requests in snoopy protocols and make them realizable on unordered networks. However, snoopy protocols still have the broadcast overhead because each coherence request goes to all cores in the system. This has substantial network bandwidth and power implications. In this work, we propose embedding small in-network coherence filters inside on-chip routers that dynamically track sharing patterns among various cores. This sharing information is used to filter away redundant snoop requests that are traveling towards unshared cores. Filtering these useless messages saves network bandwidth and power and makes snoopy protocols on many-core systems truly scalable. Our in-network coherence filters are able to reduce the total number of snoops in the system on an average by 41.9%, thereby reducing total network traffic by 25.4% on 16-processor chip multiprocessor (CMP) systems running parallel applications. For 64-processor CMP systems, our filtering technique on an average achieves 46.5% reduction in total number of snoops that ends up reducing the total network traffic by 27.3%, on an average.
机译:随着晶体管的小型化导致大量的片上资源和单处理器设计提供了越来越少的回报,该行业已经超越了单核微处理器,迎来了多核浪潮。可伸缩的高速缓存一致性协议实现是必需的,以允许在各个内核之间快速共享数据并推动多核革命。史努比一致性协议(如果可实现)具有所需的属性,即具有较低的存储开销,并且不会对缓存到缓存的访问增加间接延迟。有各种提议,例如令牌一致性(TokenB),Uncorq,Intel QPI,INSO和Timestamp Snooping,它们可以解决snoopy协议中的请求排序问题,并使请求在无序网络上可以实现。但是,由于每个一致性请求都发送到系统中的所有核心,因此窥探协议仍然具有广播开销。这具有相当大的网络带宽和功率影响。在这项工作中,我们建议将小型网络内一致性过滤器嵌入到片上路由器中,以动态跟踪各个内核之间的共享模式。该共享信息用于过滤掉正在向未共享内核传播的冗余侦听请求。过滤这些无用的消息可节省网络带宽和功耗,并使多核系统上的窥探协议真正可扩展。我们的网络内一致性过滤器能够平均减少系统中的侦听总数41.9%,从而在运行并行应用程序的16处理器芯片多处理器(CMP)系统上将总网络流量减少25.4%。对于64处理器CMP系统,我们的过滤技术平均可将侦听总数减少46.5%,最终平均将总网络流量减少27.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号