NUMA-BTLP: A static algorithm for thread classification

机译：Numa-BTLP：一种用于线程分类的静态算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite NUMA aware optimizations are often considered not portable, this paper states that extending a compiler, supporting compilation of parallel APIs, with NUMA-aware optimizations, significantly improves performance and energy consumption on NUMA systems, while for UMA systems, NUMA-aware optimizations do not degrade the performance, unless the overhead of calling the mapping functions is significantly bigger than the improvement produced by the optimizations. This paper introduces NUMA-BTLP algorithm, a compile-time optimization for LLVM compiler, which decides the type of each thread in the program code as a result of a static analysis of the code. NUMA-BTLP calls NUMA-BTDM algorithm which uses specific PThreads routines to set the CPU affinities of the threads (i.e. thread-core association) depending on their type returned by NUMA-BTLP. The algorithms improve thread and data mapping on NUMA systems by executing threads that share data on the same core(s), allowing fast access to LI cache data. The paper proves that task based parallel code which uses PThreads and which may contain shared-memory parallel loops (LLVM has support for both task and loop parallelism through PThreads library and OpenMP extension, respectively), is time and energy efficient at runtime when optimized using the two algorithms. However, the algorithms are expected to produce runtime energy improvements only on NUMA systems based on the energy model with constant energy consumption or on the energy model in which each core is powered from a separate source.

机译：尽管NUMA感知优化通常被认为是不可移植的，但延伸了编译器，支持并行API的编译，以NUMA感知的优化，显着提高了NUMA系统的性能和能耗，而对于UMA系统，NUMA感知优化DU除非调用映射函数的开销显着大于优化产生的改进，否则不会降低性能。本文介绍了Numa-BTLP算法，LLVM编译器的编译时优化，其由于代码的静态分析而导致程序代码中的每个线程的类型。 Numa-BTLP调用Numa-BTDM算法，该算法使用特定的PThreads例程来设置线程（即线程关联）的CPU接触，具体取决于NUMA-BTLP返回的类型。该算法通过执行在同一核心上共享数据的线程来改进NUMA系统上的线程和数据映射，允许快速访问LI高速缓存数据。本文证明了基于任务的并行代码，它使用pthreads和可能包含共享 - 内存并行环路（LLVM通过PThreads库和OpenMP扩展为任务和循环并行性支持）是时间和节能在优化时运行时这两个算法。然而，预计该算法仅产生基于具有恒定能量消耗的能量模型的NUMA系统的运行时能量改进，或者在每个核心从单独的源供电的能量模型上。

著录项

来源
《International Conference on Control, Decision and Information Technologies》|2018年|563-1139p|共6页
会议地点
作者
Iulia Stirb;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP273-53;
关键词

相似文献

外文文献
中文文献
专利

1. Improving runtime performance and energy consumption through balanced data locality with NUMA-BTLP and NUMA-BTDM static algorithms for thread classification and thread type-aware mapping [J] . International Journal of Computational Science and Engineering . 2020,第2a3期

机译：通过使用Numa-BTLP和NUMA-BTDM静态算法来提高运行时性能和能耗，用于线程分类和线程类型感知映射
2. Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree [J] . Iulia ?tirb Computers . 2018,第4期

机译：基于通信树的带线程映射的NUMA-BTLP算法扩展
3. Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces [J] . Abbasi Mahdi, Rafiee Milad Computers & Digital Techniques, IET . 2020,第6期

机译：使用多线程应用程序接口在多核中心处理单元上的分组分类算法的高效平行
4. NUMA-BTLP: A static algorithm for thread classification [C] . Iulia Ştirb International Conference on Control, Decision and Information Technologies . 2018

机译：NUMA-BTLP：用于线程分类的静态算法
5. Combining static analysis and run -time analysis for verification and testing of multi-threaded programs. [D] . Agarwal, Rahul. 2006

机译：将静态分析和运行时分析相结合，以验证和测试多线程程序。
6. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons [O] . Hongyi Zhou, Jeffrey Skolnick -1

机译：通过组合片段和模板比较来改进远程同源建模的线程算法
7. Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree [O] . Iulia Știrb 2018

机译：基于通信树的线程映射扩展Numa-BTLP算法

NUMA-BTLP: A static algorithm for thread classification

摘要

著录项

相似文献

相关主题

期刊订阅