首页> 外文会议>2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum >Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms
【24h】

Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms

机译:使用内存访问跟踪在分层多核平台上映射线程和数据

获取原文
获取外文期刊封面目录资料

摘要

In parallel programs, the tasks of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they are executing and how the memory hierarchy and interconnection are used. The problem is even more important in multi-core machines with NUMA characteristics, since the remote access imposes high overhead, making them more sensitive to thread and data mapping. In this context, process mapping is a technique that provides performance gains by improving the use of resources such as interconnections, main memory and cache memory. The problem of detecting the best mapping is considered NP-Hard. Furthermore, in shared memory environments, there is an additional difficulty of finding the communication pattern, which is implicit and occurs through memory accesses. This work aims to provide a method for static mapping for NUMA architectures which does not require any prior knowledge of the application. Different metrics were adopted and an heuristic method based on the Edmonds matching algorithm was used to obtain the mapping. In order to evaluate our proposal, we use the NAS Parallel Benchmarks (NPB) and two modern multi-core NUMA machines. Results show performance gains of up to 75% compared to the native scheduler and memory allocator of the operating system.
机译:在并行程序中,给定应用程序的任务必须配合才能完成所需的计算。但是,任务之间的通信时间可能会有所不同,具体取决于它们正在执行哪个内核以及如何使用内存层次结构和互连。在具有NUMA特性的多核计算机中,此问题甚至更为重要,因为远程访问会带来高昂的开销,从而使它们对线程和数据映射更加敏感。在这种情况下,进程映射是一种通过改进诸如互连,主内存和高速缓存之类的资源的使用来提高性能的技术。检测最佳映射的问题被认为是NP-Hard。此外,在共享内存环境中,查找通信模式存在额外的困难,该通信模式是隐式的,并且是通过内存访问而发生的。这项工作旨在为NUMA体系结构提供静态映射的方法,该方法不需要应用程序的任何先验知识。采用不同的度量,并使用基于Edmonds匹配算法的启发式方法来获取映射。为了评估我们的建议,我们使用了NAS并行基准(NPB)和两台现代的多核NUMA计算机。结果显示,与操作系统的本机调度程序和内存分配器相比,性能提高了75%。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号