首页> 外文会议>2012 IEEE 26th International Parallel and Distributed Processing Symposium >Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory
【24h】

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory

机译:使用转换后备缓冲区映射基于共享内存的并行应用程序中的线程

获取原文
获取原文并翻译 | 示例

摘要

The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables. For this reason, it is difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this paper, we introduce a new light-weight mechanism to detect the communication pattern of threads using the Translation Look aside Buffer (TLB). Our mechanism relies entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanism with the NAS Parallel Benchmarks (NPB) and achieved an accurate representation of the communication patterns. Using the detected communication patterns, we generated thread mappings using a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3%, reducing the number of cache misses by up to 31.1%.
机译:多处理器体系结构中的内核之间的通信延迟取决于存储器层次结构和互连。随着每个芯片的核数和每个核的线程数的增加,通信等待时间之间的这种差异正在增加。因此,考虑到并行应用程序之间的通信,映射并行应用程序的线程很重要。在基于共享内存范式的并行应用程序中,通信是隐式的,并且通过访问共享变量来进行。因此,难以检测线程之间的通信模式。传统方法使用仿真来监视应用程序执行的内存访问,这需要修改源代码并大大增加开销。在本文中,我们介绍了一种新的轻量级机制,该机制使用转换后备缓冲区(TLB)检测线程的通信模式。我们的机制完全依赖于硬件功能,这使线程映射对程序员透明,并允许它由操作系统动态执行。而且,不需要诸如仿真之类的耗时的任务。我们使用NAS并行基准(NPB)评估了我们的机制,并实现了通信模式的准确表示。使用检测到的通信模式,我们使用基于Edmonds图匹配算法的启发式方法生成线程映射。使用这些映射运行应用程序可将性能提高多达15.3%,将高速缓存未命中的数量减少多达31.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号