Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory

机译：使用转换后备缓冲区映射基于共享内存的并行应用程序中的线程

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables. For this reason, it is difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this paper, we introduce a new light-weight mechanism to detect the communication pattern of threads using the Translation Look aside Buffer (TLB). Our mechanism relies entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanism with the NAS Parallel Benchmarks (NPB) and achieved an accurate representation of the communication patterns. Using the detected communication patterns, we generated thread mappings using a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3%, reducing the number of cache misses by up to 31.1%.

机译：多处理器体系结构中的内核之间的通信延迟取决于存储器层次结构和互连。随着每个芯片的核数和每个核的线程数的增加，通信等待时间之间的这种差异正在增加。因此，考虑到并行应用程序之间的通信，映射并行应用程序的线程很重要。在基于共享内存范式的并行应用程序中，通信是隐式的，并且通过访问共享变量来进行。因此，难以检测线程之间的通信模式。传统方法使用仿真来监视应用程序执行的内存访问，这需要修改源代码并大大增加开销。在本文中，我们介绍了一种新的轻量级机制，该机制使用转换后备缓冲区（TLB）检测线程的通信模式。我们的机制完全依赖于硬件功能，这使线程映射对程序员透明，并允许它由操作系统动态执行。而且，不需要诸如仿真之类的耗时的任务。我们使用NAS并行基准（NPB）评估了我们的机制，并实现了通信模式的准确表示。使用检测到的通信模式，我们使用基于Edmonds图匹配算法的启发式方法生成线程映射。使用这些映射运行应用程序可将性能提高多达15.3％，将高速缓存未命中的数量减少多达31.1％。

著录项

来源
《2012 IEEE 26th International Parallel and Distributed Processing Symposium》|2012年|p.532- 543|共12页
会议地点 Shanghai(CN)
作者
Cruz Eduardo H.M.; Diener Matthias; Navaux Philippe O.A.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. Communication-aware thread mapping using the translation lookaside buffer [J] . Eduardo H. M. Cruz, Matthias Diener, Philippe O. A. Navaux Concurrency, practice and experience . 2015,第17期

机译：使用转换后备缓冲区的通信感知线程映射
2. NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications [J] . He Jiangzhou, Chen Wenguang, Tang Zhizhong Parallel Computing . 2016,第Jana期

机译：NestedMP：为嵌套的并行共享内存应用程序启用缓存感知线程映射
3. Assessing Parallel Thread Mapping Approaches on Shared Memory SMT Architectures [J] . Pinho Amorim Amanda Maria, Cota de Freitas Henrique Latin America Transactions, IEEE (Revista IEEE America Latina) . 2019,第2期

机译：在共享内存SMT架构上评估并行线程映射方法
4. Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory [C] . Cruz Eduardo H.M., Diener Matthias, Navaux Philippe O.A. IEEE International Parallel Distributed Processing Symposium . 2012

机译：使用转换后保护缓冲区以基于共享内存的并行应用程序映射线程
5. Thread mapping using system-level model for shared memory multicores [D] . Mitra, Reshmi 2015

机译：使用Syste Level模型进行共享内存多设备的线程映射
6. Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics [O] . Miguel Ruiz-Cabello N., Maksims Abaļenkovs, Luis M. Diaz Angulo, 2020

机译：共享和分布式内存架构并行FDTD方法的性能：应用脚踏电磁
7. Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives [O] . Talita Perciano, Colleen Heinemann, David Camp, 2020

机译：共享内存并行概率图形建模优化：线程，OpenMP和数据并行基元的比较

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory

摘要

著录项

相似文献

相关主题

期刊订阅