Moving address translation closer to memory in distributed shared-memory multiprocessors

Qiu X.; Dubois M.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Moving address translation closer to memory in distributed shared-memory multiprocessors

【24h】

Moving address translation closer to memory in distributed shared-memory multiprocessors

机译：将地址转换移到分布式共享内存多处理器中的内存附近

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low.

机译：为了支持全局虚拟内存空间，体系结构必须动态转换虚拟地址。在当前处理器中，转换是在TLB（转换后备缓冲区）中进行的，与第一级缓存访问之前或并行进行。随着处理器技术的快速发展和新应用程序的增长，无法满足TLB的等待时间和带宽要求，特别是在运行更大应用程序并受到TLB一致性问题困扰的多处理器系统中。我们在分布式共享内存（DSM）多处理器的上下文中描述和比较了用于虚拟地址转换的五个选项，包括CC-NUMA（缓存一致的非统一内存访问体系结构）和COMA（仅缓存内存访问体系结构）。在CC-NUMA中，将TLB移至共享内存不是一个好主意，因为页面的放置，迁移和复制都受虚拟页面地址的约束，这极大地影响了处理器节点的访问位置。在COMA的上下文中，向处理器节点分配页面并不那么关键，因为内存块可以在节点之间动态迁移和自由复制。随着地址转换在存储器层次结构的更深层完成，转换的频率由于过滤效果而下降。我们还观察到，由于共享和预取效果以及无需保持TLB一致性，将TLB与共享内存合并时非常有效。即使将TLB与共享内存合并的效率非常高，我们也显示，由于转换频率非常低，因此可以在内存中完成地址转换的系统中删除TLB。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2005年第7期|p.612-623|共12页
作者
Qiu X.; Dubois M.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
cache storage; distributed shared memory systems; memory architecture; paged storage; storage allocation; CC-NUMA; COMA; TLB; cache only memory access architecture; cache-coherent nonuniform memory access architecture; distributed shared memory multiprocessors; mem;

机译：高速缓存存储;分布式共享内存系统;内存体系结构;分页存储;存储分配;CC-NUMA;COMA;TLB;仅高速缓存内存访问体系结构;高速缓存一致性非均匀内存访问体系结构;分布式共享内存多处理器;mem;

相似文献

外文文献
中文文献
专利

1. REDUCING CONTROL LATENCY IN DISTRIBUTED SHARED-MEMORY MULTIPROCESSOR SYSTEMS USING FUZZY LOGIC PREDICTION [J] . O.M. Al-Jarrah, A. Muhsen International Journal Of Modelling & Simulation . 2005,第1期

机译：基于模糊逻辑预测的分布式共享存储器多处理器系统控制时延降低
2. An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors [J] . M. P. Malumbres, Jose Duato Journal of systems architecture . 2000,第11期

机译：分布式共享内存多处理器的基于树的多播路由的有效实现
3. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors [J] . Agarwal A., Kranz D.A. IEEE Transactions on Parallel and Distributed Systems . 1995,第9期

机译：分布式共享内存多处理器的并行循环和数据数组的自动分区
4. Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors [C] . B. Rosenburg, PB. Rosenburg ACM symposium on Operating systems principles . 1989

机译：大规模共享内存多处理器中的低同步转换后备缓冲区一致性
5. Speculative distributed shared-memory multiprocessors organized as processor-and-memory hierarchies. [D] . Figueiredo, Renato Jansen O. 2001

机译：组织为处理器和内存层次结构的推测性分布式共享内存多处理器。
6. Time-energy measured data on modern multicore systems running shared-memory applications [O] . Dumitrel Loghin, Yong Meng Teo 2019

机译：运行共享内存应用程序的现代多核系统上的时间能量测量数据
7. Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors [O] . 2014

机译：分布式共享内存多处理器中的动态程序相位检测
8. Global arrays: A portable (open quotes)shared-memory(close quotes) programming model for distributed memory computers [R] . Harrison, R. J. , Nieplocha, J. , Littlefield, R. J. 1994

机译：全局数组：分布式内存计算机的可移植（开放引号）共享内存（close quotes）编程模型

Moving address translation closer to memory in distributed shared-memory multiprocessors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅