Analysis of shared memory misses and reference patterns

机译：共享内存未命中和参考模式的分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Shared bus computer systems permit the relatively simple and efficient implementation of cache consistency algorithms, but the shared bus is a bottleneck which limits performance. False sharing can be an important source of unnecessary traffic for invalidation-based protocols, elimination of which can provide significant performance improvements. For many multiprocessor workloads, however, most misses are true sharing plus cold start misses. Regardless of the cause of cache misses, the largest fraction of bus traffic are words transferred between caches without being accessed, which we refer to as dead sharing. We establish here new methods for characterizing cache block reference patterns, and we measure how these patterns change with variation in workload and block size. Our results show that 42 percent of 64-byte cache blocks are invalidated before more than one word has been read from the block and that 58 percent of blocks that have been modified only have a single word modified before an invalidation to the block occurs. Approximately 50 percent of blocks written and subsequently read by other caches show no use of the newly written information before the block is again invalidated. In addition to our general analysis of reference patterns, we also present a detailed analysis of dead sharing for each shared memory multiprocessor program studied. We find that the worst 10 blocks (based on most total misses) from each of our traces contribute almost 50 percent of the false shearing misses and almost 20 percent of the true sharing misses (on average). A relatively simple restructuring of four of our workloads based on analysis of these 10 worst blocks leads to a 21 percent reduction in overall misses and a 15 percent reduction in execution time. Permitting the block size to vary (as could be accomplished with a sector cache) shows that bus traffic can be reduced by 88 percent (for 64-byte blocks) while also decreasing the miss ratio by 35 percent.

机译：共享总线计算机系统允许缓存一致性算法的相对简单和有效的实现，但共享总线是限制性能的瓶颈。虚假分享可以是基于无效的协议的不必要流量的重要来源，消除它可以提供显着的性能改进。然而，对于许多多处理器工作负载，大多数未命中都是真正的共享加冷启动未命中。无论缓存未命中的原因如何，总线流量的最大一部分是在缓存之间传输的单词，而不被访问，我们将其称为死共享。我们在这里建立了用于表征缓存块参考模式的新方法，我们测量这些模式如何随工作负载和块大小的变化而变化。我们的结果表明，在从块读取多个单词之前，42％的64字节缓存块是无效的，并且在发生在块的无效之前，只有一个已修改的块的块只有一个单词。在块再次无效之前，其他高速缓存写入的大约50％的块显示并随后读取的块显示不使用新书写的信息。除了我们对参考模式的一般分析外，我们还对所研究的每个共享内存多处理程序程序进行了详细分析。我们发现，来自我们每个迹线的最糟糕的10个街区（基于大多数总未命中）贡献了近50％的虚假剪切未命中，近20％的真实共享未命中（平均）。基于这10个最差块的分析的四个工作负载的相对简单地重组了我们的四个工作量导致总体未命中的减少21％，并且执行时间减少了15％。允许块大小变化（可以通过扇区缓存完成）显示总线流量可以减少88％（对于64字节块），同时也将未命中的比例降低35％。

著录项

来源
《International Conference on Computer Design》|2000年||共12页
会议地点
作者
Rothman J.B.; Smith A.J.; Institute of Electric and Electronic Engineer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Binding Time in Distributed Shared Memories for Generic Patterns of Memory References [J] . Jinseok KONG, Gyungho LEE IEICE Transactions on Information and Systems . 2004,第8期

机译：内存引用的通用模式在分布式共享内存中的绑定时间
2. Understanding car sharing preferences and mode substitution patterns: A stated preference experiment [J] . Carrone Andrea Papu, Hoening Valerie Maria, Jensen Anders Fjendbo, Transport policy . 2020,第Nova期

机译：了解汽车共享偏好和模式替代模式：陈述偏好实验
3. Analytical derivation of traffic patterns in cache-coherent shared-memory systems [J] . Matthias B. Stuart, Jens Sparso Microprocessors and microsystems . 2011,第7期

机译：缓存一致的共享内存系统中流量模式的解析推导
4. Analysis of shared memory misses and reference patterns [C] . Rothman, J.B., Smith, . 2000

机译：分析共享内存丢失和参考模式
5. Development and analysis of weak memory consistency models to accelerate shared-memory multiprocessor systems [D] . Yoon, Myungchul 1998

机译：开发和分析弱内存一致性模型以加速共享内存多处理器系统
6. Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex [O] . Kiyoshi Nakahara, Ken Adachi, Keisuke Kawasaki, -1

机译：联想记忆表示以跨越灵长类颞皮质的theta活动的共享空间模式出现
7. Reducing the Latency of L2 Misses in Shared-Memory Multiprocessors through On-Chip Directory Integration [O] . Manuel E. Acacio, José González, José M. García 2008

机译：通过片上目录集成减少共享内存多处理器中的L2丢失延迟
8. An Experimental Analysis of Program Reference Patterns in the MULTICS Virtual Memory. [R] . greenberg,bernard s. 1974

机译：多虚拟记忆中程序参考模式的实验分析。

Analysis of shared memory misses and reference patterns

摘要

著录项

相似文献

相关主题

期刊订阅