首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Kernel-Based Thread and Data Mapping for Improved Memory Affinity
【24h】

Kernel-Based Thread and Data Mapping for Improved Memory Affinity

机译:基于内核的线程和数据映射可提高内存亲和力

获取原文
获取原文并翻译 | 示例

摘要

Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping threads that share data to cores with a shared cache, cache usage and communication performance are optimized. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques thread mapping and data mapping, respectively. Thread and data mapping should be performed in an integrated way to achieve a compounding effect that results in higher improvements overall. Previous work in this area requires expensive tracing operations to perform the mapping, or require changes to the hardware or to the parallel application. In this paper, we propose kMAF, a mechanism that performs integrated thread and data mapping in the kernel. kMAF uses the page faults of parallel applications to characterize their memory access behavior and performs the mapping during the execution of the application based on the detected behavior. In an evaluation with a large set of parallel benchmarks executing on three NUMA architectures, kMAF achieved substantial performance and energy efficiency improvements, close to an Oracle-based mechanism and significantly higher than previous proposals.
机译:在性能和能耗方面,降低内存访问成本是共享内存体系结构的主要挑战。现代系统具有复杂的内存层次结构,具有多个缓存级别和内存控制器,从而导致非统一内存访问(NUMA)行为。在这样的系统中,有两种方法可以提高内存亲和力:首先,通过将共享数据的线程映射到具有共享缓存的内核,可以优化缓存使用率和通信性能。其次,通过将内存页映射到对它们执行最多访问并且没有过载的内存控制器,可以降低平均访问成本。我们将这两种技术分别称为线程映射和数据映射。线程和数据映射应以集成方式执行,以实现复合效果,从而总体上实现更高的改进。该领域的先前工作需要昂贵的跟踪操作来执行映射,或者需要更改硬件或并行应用程序。在本文中,我们提出了kMAF,一种在内核中执行集成线程和数据映射的机制。 kMAF使用并行应用程序的页面错误来表征其内存访问行为,并根据检测到的行为在应用程序执行期间执行映射。在对在三种NUMA架构上执行的大量并行基准进行评估的过程中,kMAF实现了显着的性能和能效改进,接近基于Oracle的机制,并且显着高于以前的提议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号