首页> 外文会议>International Conference on Control, Decision and Information Technologies >NUMA-BTLP: A static algorithm for thread classification
【24h】

NUMA-BTLP: A static algorithm for thread classification

机译:Numa-BTLP:一种用于线程分类的静态算法

获取原文

摘要

Despite NUMA aware optimizations are often considered not portable, this paper states that extending a compiler, supporting compilation of parallel APIs, with NUMA-aware optimizations, significantly improves performance and energy consumption on NUMA systems, while for UMA systems, NUMA-aware optimizations do not degrade the performance, unless the overhead of calling the mapping functions is significantly bigger than the improvement produced by the optimizations. This paper introduces NUMA-BTLP algorithm, a compile-time optimization for LLVM compiler, which decides the type of each thread in the program code as a result of a static analysis of the code. NUMA-BTLP calls NUMA-BTDM algorithm which uses specific PThreads routines to set the CPU affinities of the threads (i.e. thread-core association) depending on their type returned by NUMA-BTLP. The algorithms improve thread and data mapping on NUMA systems by executing threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that task based parallel code which uses PThreads and which may contain shared-memory parallel loops (LLVM has support for both task and loop parallelism through PThreads library and OpenMP extension, respectively), is time and energy efficient at runtime when optimized using the two algorithms. However, the algorithms are expected to produce runtime energy improvements only on NUMA systems based on the energy model with constant energy consumption or on the energy model in which each core is powered from a separate source.
机译:尽管NUMA感知优化通常被认为是不可移植的,但延伸了编译器,支持并行API的编译,以NUMA感知的优化,显着提高了NUMA系统的性能和能耗,而对于UMA系统,NUMA感知优化DU除非调用映射函数的开销显着大于优化产生的改进,否则不会降低性能。本文介绍了Numa-BTLP算法,LLVM编译器的编译时优化,其由于代码的静态分析而导致程序代码中的每个线程的类型。 Numa-BTLP调用Numa-BTDM算法,该算法使用特定的PThreads例程来设置线程(即线程关联)的CPU接触,具体取决于NUMA-BTLP返回的类型。该算法通过执行在同一核心上共享数据的线程来改进NUMA系统上的线程和数据映射,允许快速访问L1高速缓存数据。本文证明了基于任务的并行代码,它使用pthreads和可能包含共享 - 内存并行环路(LLVM通过PThreads库和OpenMP扩展为任务和循环并行性支持)是时间和节能在优化时运行时这两个算法。然而,预计该算法仅产生基于具有恒定能量消耗的能量模型的NUMA系统的运行时能量改进,或者在每个核心从单独的源供电的能量模型上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号