首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Moving address translation closer to memory in distributed shared-memory multiprocessors
【24h】

Moving address translation closer to memory in distributed shared-memory multiprocessors

机译:将地址转换移到分布式共享内存多处理器中的内存附近

获取原文
获取原文并翻译 | 示例

摘要

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low.
机译:为了支持全局虚拟内存空间,体系结构必须动态转换虚拟地址。在当前处理器中,转换是在TLB(转换后备缓冲区)中进行的,与第一级缓存访问之前或并行进行。随着处理器技术的快速发展和新应用程序的增长,无法满足TLB的等待时间和带宽要求,特别是在运行更大应用程序并受到TLB一致性问题困扰的多处理器系统中。我们在分布式共享内存(DSM)多处理器的上下文中描述和比较了用于虚拟地址转换的五个选项,包括CC-NUMA(缓存一致的非统一内存访问体系结构)和COMA(仅缓存内存访问体系结构)。在CC-NUMA中,将TLB移至共享内存不是一个好主意,因为页面的放置,迁移和复制都受虚拟页面地址的约束,这极大地影响了处理器节点的访问位置。在COMA的上下文中,向处理器节点分配页面并不那么关键,因为内存块可以在节点之间动态迁移和自由复制。随着地址转换在存储器层次结构的更深层完成,转换的频率由于过滤效果而下降。我们还观察到,由于共享和预取效果以及无需保持TLB一致性,将TLB与共享内存合并时非常有效。即使将TLB与共享内存合并的效率非常高,我们也显示,由于转换频率非常低,因此可以在内存中完成地址转换的系统中删除TLB。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号