首页> 外文期刊>International journal of parallel programming >Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming
【24h】

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

机译:多设备控制器:简化并行异构编程的库

获取原文
获取原文并翻译 | 示例

摘要

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA's GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.
机译:当前的HPC集群由具有不同计算能力以及不同种类和系列加速器的几台计算机组成。这些异构系​​统的有效编程已成为一项重要的挑战。有许多建议可以简化加速器设备的编程和管理,以及混合编程,混合加速器和CPU内核。但是,在许多情况下,可移植性损害了不同设备上的效率,并且有一些有关协调不同类型设备的细节,程序员仍然应该解决这些问题。在这项工作中,我们介绍了Multi-Controller,它是在库中实现的抽象实体,用于协调异构设备的管理,包括具有不同功能和CPU核心集的加速器。我们的提案改进了最新的解决方案,简化了数据划分,映射和两者的透明部署,可在不同设备类型之间移植的简单通用内核,以及使用特定的本机或供应商编程模型(例如CUDA)定义和优化的专用实现。适用于NVIDIA的GPU,或适用于CPU内核的OpenMP)。运行时系统自动为每个设备选择并部署每个内核的最合适的实现,从而管理数据移动并隐藏启动细节。包含五个研究案例的实验研究结果表明,我们的抽象功能允许开发适应异构环境的灵活高效的程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号