XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

机译：XKaapi：用于异构体系结构上的数据流任务编程的运行时系统

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes, scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports a data-flow task model and a locality-aware work stealing scheduler. XKaapi enables task multi-implementation on CPU or GPU and multi-level parallelism with different grain sizes. We show performance results on two dense linear algebra kernels, matrix product (GEMM) and Cholesky factorization (POTRF), to evaluate XKaapi on a heterogeneous architecture composed of two hexa-core CPUs and eight NVIDIA Fermi GPUs. Our conclusion is two-fold. First, fine grained parallelism and online scheduling achieve performance results as good as static strategies, and in most cases outperform them. This is due to an improved work stealing strategy that includes locality information, a very light implementation of the tasks in XKaapi, and an optimized search for ready tasks. Next, the multi-level parallelism on multiple CPUs and GPUs enabled by XKaapi led to a highly efficient Cholesky factorization. Using eight NVIDIA Fermi GPUs and four CPUs, we measure up to 2.43 TFlop/s on double precision matrix product and 1.79 TFlop/s on Cholesky factorization, and respectively 5.09 TFlop/s and 3.92 TFlop/s in single precision.

机译：最新的HPC平台具有由多核CPU和加速器（如GPU）组成的异构节点。对此类节点进行编程通常基于OpenMP和CUDA / OpenCL代码的组合，调度依赖于静态分区和成本模型。我们介绍了用于在多CPU和多GPU架构上进行数据流任务编程的XKaapi运行时系统，该系统支持数据流任务模型和可感知位置的工作窃取调度程序。 XKaapi支持在CPU或GPU上执行任务多实施以及具有不同粒度的多级并行性。我们在两个密集线性代数内核，矩阵乘积（GEMM）和Cholesky因式分解（POTRF）上显示了性能结果，以在由两个六核CPU和八个NVIDIA Fermi GPU组成的异构体系结构上评估XKaapi。我们的结论有两个方面。首先，细粒度的并行性和在线调度可以获得与静态策略一样好的性能结果，并且在大多数情况下都优于静态策略。这是由于改进了的工作窃取策略，其中包括位置信息，XKaapi中任务的非常轻实的实现以及对就绪任务的优化搜索。接下来，XKaapi支持在多个CPU和GPU上进行多级并行处理，从而实现了高效的Cholesky分解。使用八个NVIDIA Fermi GPU和四个CPU，我们在双精度矩阵乘积上的测量最高速度为2.43 TFlop / s，在Cholesky因子分解中测得的速度为1.79 TFlop / s，在单精度下分别为5.09 TFlop / s和3.92 TFlop / s。

著录项

来源
《IEEE International Parallel Distributed Processing Symposium》|2013年|1299-1308|共10页
会议地点 Boston MA(US)
作者
Gautier Thierry; Lima Joao V.F.; Maillard Nicolas; Raffin Bruno;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data-Flow task model; Dense Linear Algebra; Heterogeneous architectures; High Performance Computing; Locality Aware Work Stealing;

机译：数据流任务模型；密集线性代数异构架构；高性能计算；位置感知工作窃取;

相似文献

外文文献
中文文献
专利

1. Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures [J] . Luka Stanisic, Samuel Thibault, Arnaud Legrand, Concurrency and computation: practice and experience . 2015,第16期

机译：异构多核体系结构基于动态任务的运行时系统的忠实性能预测
2. Data-flow driven optimal tasks distribution for global heterogeneous systems [J] . Jordi Garcia, Francesc Aguilo, Adria Asensio, Future generation computer systems . 2021,第Deca期

机译：数据流驱动的全局异构系统的最佳任务分布
3. Runtime Task Scheduling Using Imitation Learning for Heterogeneous Many-Core Systems [J] . Krishnakumar Anish, Arda Samet E., Goksoy A. Alper, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：运行时任务调度使用模仿学习对于异构许多核心系统
4. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures [C] . Gautier Thierry, Lima Joao V.F., Maillard Nicolas, IEEE International Parallel Distributed Processing Symposium . 2013

机译：XKAAPI：异构架构上的数据流任务编程的运行时系统
5. Leveraging Data-Flow Information for Efficient Scheduling of Task-Parallel Programs on Heterogeneous Systems [D] . Simsek, Osman Seckin. 2020

机译：利用数据流信息，以便有效调度异构系统上的任务并行程序
6. Optimization of a novel programmable data-flow crypto processor using NSGA-II algorithm [O] . Mahmoud T. El-Hadidi, Hany M. Elsayed, Karim Osama, 2018

机译：使用NSGA-II算法优化新型可编程数据流密码处理器
7. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures [O] . Thierry Gautier, João V. F. Lima, Nicolas Maillard, 2013

机译：XKaapi：用于异构架构上的数据流任务编程的运行时系统

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅