首页> 外文会议>Conference on Computing frontiers >An efficient cache design for scalable glueless shared-memory multiprocessors

【24h】

An efficient cache design for scalable glueless shared-memory multiprocessors

机译：用于可伸缩的无胶共享内存多处理器的高效缓存设计

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is generally put in the critical path of every cache miss, increasing its latency. Considering the ever-increasing distance to memory, these cache coherence protocols are far from being optimal from the perspective of performance. On the other hand, shared-memory multiprocessors formed by connecting chips that integrate the processor, caches, coherence logic, switch and memory controller through a low-cost, low-latency point-to-point network (glueless shared-memory multiprocessors) are a reality.In this work, we propose a novel design for the L2 cache level, at which coherence has to be maintained, aimed at being used in glueless shared-memory multiprocessors. Our proposal splits the cache structure into two different parts: one for storing data and directory information for the blocks requested by the local processor, and another one for storing only directory information for blocks accessed by remote processors. Using this cache scheme we remove the directory from main memory. Besides saving memory space, our proposal brings very significant reductions in terms of latency of the cache misses (speed-ups of 3.0 on average), which translate into reductions in applications' execution time of 31% on average.

机译：传统上，大型共享内存多处理器中的缓存一致性是通过存储在主内存中的分布式目录结构来确保的。以此方式，通常将对主存储器的访问以恢复块的共享状态放置在每个高速缓存未命中的关键路径中，从而增加其等待时间。考虑到到内存的距离不断增加，从性能的角度来看，这些高速缓存一致性协议远非最佳。另一方面，通过低成本，低延迟的点对点网络连接集成了处理器，高速缓存，一致性逻辑，交换器和存储器控制器的芯片而形成的共享内存多处理器（无胶共享内存多处理器）是在这项工作中，我们为L2高速缓存级别提出了一种新颖的设计，该设计必须保持一致性，旨在用于无胶共享内存多处理器中。我们的建议将缓存结构分为两个不同的部分：一个用于存储本地处理器请求的块的数据和目录信息，另一个用于仅存储远程处理器访问的块的目录信息。使用这种缓存方案，我们从主内存中删除了目录。除了节省内存空间外，我们的建议还大大降低了缓存未命中的延迟（平均速度提高了3.0），这意味着应用程序的执行时间平均减少了31％。

著录项

来源
《Conference on Computing frontiers》|2006年|P.321-330|共10页
会议地点
作者
Alberto Ros; Manuel E. Acacio; Jose M. Garcia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
memory wall;

机译：记忆墙;

相似文献

外文文献
中文文献
专利

1. Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors [J] . Alberto Ros, Ricardo Fernandez-Pascual, Manuel E. Acacio, Journal of Parallel and Distributed Computing . 2008,第11期

机译：关于在无胶共享内存多处理器的最后一级私有高速缓存中包含目录信息的两个建议
2. A performance study of cache coherence protocols and write caches for parallel-multithreaded shared-memory multiprocessors [J] . Chao-Chin Wu, Cheng Chen Journal of the Chinese Institute of Engineers . 1998,第1期

机译：并行多线程共享内存多处理器的缓存一致性协议和写缓存的性能研究
3. EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors [J] . Cheng Hsiang-Yun, Poremba Matt, Shahidi Narges, ACM Transactions on Architecture and Code Optimization . 2015,第2期

机译：EECache：芯片多处理器中节能最后一级缓存的体系结构设计的综合研究
4. An efficient cache design for scalable glueless shared-memory multiprocessors [C] . Alberto Ros, Manuel E. Acacio, Jose M. Garcia Conference on Computing frontiers . 2006

机译：可伸缩可粘合共享内存多处理器的有效缓存设计
5. Performance evaluation of TLB consistency solutions in large-scale shared-memory multiprocessors with consistent caches. [D] . Maydeo, Ketan A. 2005

机译：具有一致的高速缓存的大型共享内存多处理器中TLB一致性解决方案的性能评估。
6. Energy-Efficient Joint Design of Fronthaul and Edge Links for Cache-Aided C-RAN Systems with Wireless Fronthaul [O] . Junbeom Kim, Daesung Yu, Seung-Eun Hong, 2019

机译：具有无线Fronthaul的高速缓存辅助C-RAN系统的Fronthaul和Edge Links的节能关节设计
7. An Efficient Cache Design for Scalable Glueless Shared-Memory Multiprocessors [O] . Alberto Ros, Manuel E. Acacio, José M. García 2006

机译：可扩展无缝共享内存多处理器的高效缓存设计
8. Performance and scalability aspects of directory-based cache coherence in shared-memory multiprocessors [R] . Picano, S., Meyer, D. G., Brooks, E. D., 1993

机译：共享内存多处理器中基于目录的高速缓存一致性的性能和可伸缩性方面

An efficient cache design for scalable glueless shared-memory multiprocessors

摘要

著录项

相似文献

相关主题

期刊订阅