首页> 外文期刊>ACM Transactions on Modeling and Performance Evaluation of Computing Systems >Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit
【24h】

Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit

机译:使用共享感知内存管理单元的在线线程和数据映射

获取原文
获取原文并翻译 | 示例

摘要

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings.In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.
机译:当前和未来的架构依赖于线程级并行性来维持性能增长。这些架构引入了一个复杂的内存层次结构,由多个缓存级别和numa节点组织的多个核心组成。这些内存层次结构可能对并行应用的性能和能量效率产生影响,因为内存访问局部性的重要性增加。为了改善局部性,并行应用程序的内存访问行为的分析对于映射线程和数据至关重要。尽管如此,最先前的工作依赖于关于内存访问的间接信息,或者不组合线程和数据映射,从而导致更准确的映射。在本文中,我们提出了共享感知内存管理单元(SAMMU),扩展到内存管理单元允许它检测硬件中的内存访问行为。通过此信息,操作系统可以执行在线映射,而无需以前的关于应用程序行为的知识。在具有广泛并行应用的评估(NAS并联基准和PARSEC基准套件)中,性能提高了高达35.7%(平均10.0%),能效高达11.9%(平均4.1%) 。由于缓存未命中和互连流量大幅减少,因此发生了这些改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号