首页> 外文期刊>Journal of supercomputing >dOCAL: high-level distributed programming with OpenCL and CUDA
【24h】

dOCAL: high-level distributed programming with OpenCL and CUDA

机译:Docal:具有OpenCL和CUDA的高级分布式编程

获取原文
获取原文并翻译 | 示例

摘要

In the state-of-the-art parallel programming approaches OpenCL and CUDA, so-called host code is required for program's execution. Efficiently implementing host code is often a cumbersome task, especially when executing OpenCL and CUDA programs on systems with multiple nodes, each comprising different devices, e.g., multi-core CPU and graphics processing units; the programmer is responsible for explicitly managing node's and device's memory, synchronizing computations with data transfers between devices of potentially different nodes and for optimizing data transfers between devices' memories and nodes' main memories, e.g., by using pinned main memory for accelerating data transfers and overlapping the transfers with computations. We develop distributed OpenCL/CUDA abstraction layer (dOCAL)-a novel high-level C++ library that simplifies the development of host code. dOCAL combines major advantages over the state-of-the-art high-level approaches: (1) it simplifies implementing both OpenCL and CUDA host code by providing a simple-to-use, high-level abstraction API; (2) it supports executing arbitrary OpenCL and CUDA programs; (3) it allows conveniently targeting the devices of different nodes by automatically managing node-to-node communications; (4) it simplifies implementing data transfer optimizations by providing different, specially allocated memory regions, e.g., pinned main memory for overlapping data transfers with computations; (5) it optimizes memory management by automatically avoiding unnecessary data transfers; (6) it enables interoperability between OpenCL and CUDA host code for systems with devices from different vendors. Our experiments show that dOCAL significantly simplifies the development of host code for heterogeneous and distributed systems, with a low runtime overhead.
机译:在最先进的并行编程方法OpenCL和CUDA中,程序的执行需要所谓的主代码。有效地实现主机代码通常是一个繁琐的任务,特别是在执行具有多个节点的系统上的OpenCL和CUDA程序时,每个功能包括不同的设备,例如多核CPU和图形处理单元;程序员负责明确地管理节点和设备的内存,同步具有潜在节点的设备之间的数据传输的计算,并用于优化设备存储器和节点之间的数据传输,例如,通过使用固定的主存储器加速数据传输和与计算重叠转移。我们开发分布式OpenCl / CUDA抽象层(Docal)-A新型高级C ++库,简化了主机代码的开发。 Docal结合了最先进的高级方法:(1)它通过提供简单使用的高级抽象API来简化实现OpenCL和CUDA主机代码; (2)它支持执行任意OpenCL和CUDA计划; (3)通过自动管理节点到节点通信,它允许方便地定位不同节点的设备; (4)简化了通过提供不同,特殊分配的存储区,例如固定主存储器来实现数据传输优化,用于将数据传输与计算重叠; (5)它通过自动避免不必要的数据传输来优化内存管理; (6)它可以实现来自不同供应商的设备的OpenCL和CUDA主机代码之间的互操作性。我们的实验表明,Docal显着简化了异构和分布式系统的主机代码的开发,具有低运行时开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号