Kernel-Based Thread and Data Mapping for Improved Memory Affinity

Matthias Diener; Eduardo H. M. Cruz; Marco A. Z. Alves; Philippe O. A. Navaux; Anselm Busse; Hans-Ulrich Heiss

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Kernel-Based Thread and Data Mapping for Improved Memory Affinity

【24h】

Kernel-Based Thread and Data Mapping for Improved Memory Affinity

机译：基于内核的线程和数据映射可提高内存亲和力

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping threads that share data to cores with a shared cache, cache usage and communication performance are optimized. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques thread mapping and data mapping, respectively. Thread and data mapping should be performed in an integrated way to achieve a compounding effect that results in higher improvements overall. Previous work in this area requires expensive tracing operations to perform the mapping, or require changes to the hardware or to the parallel application. In this paper, we propose kMAF, a mechanism that performs integrated thread and data mapping in the kernel. kMAF uses the page faults of parallel applications to characterize their memory access behavior and performs the mapping during the execution of the application based on the detected behavior. In an evaluation with a large set of parallel benchmarks executing on three NUMA architectures, kMAF achieved substantial performance and energy efficiency improvements, close to an Oracle-based mechanism and significantly higher than previous proposals.

机译：在性能和能耗方面，降低内存访问成本是共享内存体系结构的主要挑战。现代系统具有复杂的内存层次结构，具有多个缓存级别和内存控制器，从而导致非统一内存访问（NUMA）行为。在这样的系统中，有两种方法可以提高内存亲和力：首先，通过将共享数据的线程映射到具有共享缓存的内核，可以优化缓存使用率和通信性能。其次，通过将内存页映射到对它们执行最多访问并且没有过载的内存控制器，可以降低平均访问成本。我们将这两种技术分别称为线程映射和数据映射。线程和数据映射应以集成方式执行，以实现复合效果，从而总体上实现更高的改进。该领域的先前工作需要昂贵的跟踪操作来执行映射，或者需要更改硬件或并行应用程序。在本文中，我们提出了kMAF，一种在内核中执行集成线程和数据映射的机制。 kMAF使用并行应用程序的页面错误来表征其内存访问行为，并根据检测到的行为在应用程序执行期间执行映射。在对在三种NUMA架构上执行的大量并行基准进行评估的过程中，kMAF实现了显着的性能和能效改进，接近基于Oracle的机制，并且显着高于以前的提议。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2016年第9期|2653-2666|共14页
作者
Matthias Diener; Eduardo H. M. Cruz; Marco A. Z. Alves; Philippe O. A. Navaux; Anselm Busse; Hans-Ulrich Heiss;
展开▼
作者单位

Informatics Institute of the Federal University of Rio Grande do Sul, Porto Alegre, Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cache memories; NUMA; data mapping; memory affinity; shared memory; thread mapping; virtual memory;

机译：高速缓存内存;NUMA;数据映射;内存亲和力;共享内存;线程映射;虚拟内存;

相似文献

外文文献
中文文献
专利

1. Affinity-Based Thread and Data Mapping in Shared Memory Systems [J] . Diener Matthias, Cruz Eduardo H. M., Alves Marco A. Z., ACM Computing Surveys . 2017,第4期

机译：共享内存系统中基于亲和力的线程和数据映射
2. Improving runtime performance and energy consumption through balanced data locality with NUMA-BTLP and NUMA-BTDM static algorithms for thread classification and thread type-aware mapping [J] . International Journal of Computational Science and Engineering . 2020,第2a3期

机译：通过使用Numa-BTLP和NUMA-BTDM静态算法来提高运行时性能和能耗，用于线程分类和线程类型感知映射
3. Threads and Data Mapping: Affinity Analysis for Traffic Reduction [J] . Qi Hu, Peng Liu, Michael C. Huang IEEE computer architecture letters . 2016,第2期

机译：线程和数据映射：减少流量的亲和力分析
4. Thread affinity mapping for irregular data access on shared Cache GPGPU [C] . Hsien-Kai Kuo, Kuan-Ting Chen, Lai Bo-Cheng Charles, 2012 17th Asia and South Pacific Design Automation Conference . 2012

机译：线程相似性映射用于共享缓存GPGPU上的不规则数据访问
5. Thread mapping using system-level model for shared memory multicores [D] . Mitra, Reshmi 2015

机译：使用Syste Level模型进行共享内存多设备的线程映射
6. Mapping differential interactomes by affinity purification coupled with data independent mass spectrometry acquisition [O] . Jean-Philippe Lambert, Gordana Ivosev, Amber L. Couzens, -1

机译：通过亲和纯化结合数据独立质谱法绘制差异相互作用组图
7. DagTM: An Energy-Efficient Threads Grouping Mapping for Many-Core Systems Based on Data Affinity [O] . Tao Ju, Xiaoshe Dong, Heng Chen, 2016

机译：DagTm：基于数据亲和性的多核系统的节能线程分组映射

Kernel-Based Thread and Data Mapping for Improved Memory Affinity

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅