首页> 外文期刊>International journal of parallel programming >Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming
【24h】

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

机译:多设备控制器:简化并行异构编程的库

获取原文
获取原文并翻译 | 示例

摘要

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA's GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.
机译:目前的HPC集群由具有不同计算能力和不同种类和加速器家族的多种机器组成。为这些异质系统有效地编程已成为一个重要的挑战。有许多建议简化了加速器设备的编程和管理,以及混合编程,混合加速器和CPU核心。然而,在许多情况下,可移植性会损害不同设备的效率,并且有关于仍然应该由程序员解决的不同类型设备的协调的细节。在这项工作中,我们介绍了多个控制器,在库中实现的一个抽象实体,该库在库中协调异构设备管理,包括具有不同能力和CPU-Cores集的加速器。我们的提议改善了最先进的解决方案,简化了数据分区,映射和透明部署,跨越不同的设备类型,以及使用特定本机或供应商编程模型定义和优化的专用实现(例如CUDA对于NVIDIA的GPU或CPU-Cores的OpenMP)。运行时系统会自动选择和部署每个设备的每个内核的最合适的实现,管理数据移动并隐藏发射详细信息。具有五个研究案例的实验研究的结果表明我们的抽象允许开发适应异构环境的灵活和高效的程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号