首页> 外文会议>International Conference on Computer Design >Analysis of shared memory misses and reference patterns
【24h】

Analysis of shared memory misses and reference patterns

机译:共享内存未命中和参考模式的分析

获取原文

摘要

Shared bus computer systems permit the relatively simple and efficient implementation of cache consistency algorithms, but the shared bus is a bottleneck which limits performance. False sharing can be an important source of unnecessary traffic for invalidation-based protocols, elimination of which can provide significant performance improvements. For many multiprocessor workloads, however, most misses are true sharing plus cold start misses. Regardless of the cause of cache misses, the largest fraction of bus traffic are words transferred between caches without being accessed, which we refer to as dead sharing. We establish here new methods for characterizing cache block reference patterns, and we measure how these patterns change with variation in workload and block size. Our results show that 42 percent of 64-byte cache blocks are invalidated before more than one word has been read from the block and that 58 percent of blocks that have been modified only have a single word modified before an invalidation to the block occurs. Approximately 50 percent of blocks written and subsequently read by other caches show no use of the newly written information before the block is again invalidated. In addition to our general analysis of reference patterns, we also present a detailed analysis of dead sharing for each shared memory multiprocessor program studied. We find that the worst 10 blocks (based on most total misses) from each of our traces contribute almost 50 percent of the false shearing misses and almost 20 percent of the true sharing misses (on average). A relatively simple restructuring of four of our workloads based on analysis of these 10 worst blocks leads to a 21 percent reduction in overall misses and a 15 percent reduction in execution time. Permitting the block size to vary (as could be accomplished with a sector cache) shows that bus traffic can be reduced by 88 percent (for 64-byte blocks) while also decreasing the miss ratio by 35 percent.
机译:共享总线计算机系统允许缓存一致性算法的相对简单和有效的实现,但共享总线是限制性能的瓶颈。虚假分享可以是基于无效的协议的不必要流量的重要来源,消除它可以提供显着的性能改进。然而,对于许多多处理器工作负载,大多数未命中都是真正的共享加冷启动未命中。无论缓存未命中的原因如何,总线流量的最大一部分是在缓存之间传输的单词,而不被访问,我们将其称为死共享。我们在这里建立了用于表征缓存块参考模式的新方法,我们测量这些模式如何随工作负载和块大小的变化而变化。我们的结果表明,在从块读取多个单词之前,42%的64字节缓存块是无效的,并且在发生在块的无效之前,只有一个已修改的块的块只有一个单词。在块再次无效之前,其他高速缓存写入的大约50%的块显示并随后读取的块显示不使用新书写的信息。除了我们对参考模式的一般分析外,我们还对所研究的每个共享内存多处理程序程序进行了详细分析。我们发现,来自我们每个迹线的最糟糕的10个街区(基于大多数总未命中)贡献了近50%的虚假剪切未命中,近20%的真实共享未命中(平均)。基于这10个最差块的分析的四个工作负载的相对简单地重组了我们的四个工作量导致总体未命中的减少21%,并且执行时间减少了15%。允许块大小变化(可以通过扇区缓存完成)显示总线流量可以减少88%(对于64字节块),同时也将未命中的比例降低35%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号