首页> 美国卫生研究院文献>other >Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

【2h】

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

机译：在多加速器应用程序中进行有效数据交换的运行时和体系结构支持

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.

机译：异构并行计算应用程序通常处理大型数据集，这些数据集需要多个GPU共同满足其对物理内存容量和计算吞吐量的需求。但是，先前的异构并行编程模型缺少高级抽象，这迫使程序员在多个GPU设备之间交换数据时不得不采用多种代码版本，复杂的数据复制步骤和同步方案，这导致软件开发成本高，可维护性差，甚至表现不佳。本文介绍了HPE运行时系统以及相关的体系结构支持，该体系结构支持简单，高效的编程接口，用于通过互连或跨节点网络接口在多个GPU之间交换数据。本文介绍的运行时和体系结构支持也可以用于支持其他类型的加速器。我们表明，简化的编程接口可降低编程复杂性。本文中介绍的研究始于2009年。该研究已在几代HPE运行时系统中进行了广泛的实施和测试，并自2011年以来被用于CUDA 4.0及更高版本的NVIDIA GPU硬件和驱动程序中。通过在实时运行时和硬件上运行重要的基准测试，HPE的关键功能为研究硬件支持的有效性提供了难得的机会。实验结果表明，在示例异构系统中，对等DMA和双缓冲，固定缓冲区以及软件技术可以将加速器之间的数据通信带宽提高2倍。对于3D有限差分，它们还可以将执行速度提高1.6倍，对于1D FFT，执行速度可以提高2.5倍，对于合并排序，则可以提高1.6倍，所有这些都是在真实硬件上测得的。所建议的体系结构支持使HPE运行时能够在简单的可移植用户代码下透明地部署这些优化，从而使系统设计人员可以自由地使用具有不同功能的设备。我们进一步指出，大多数应用程序需要简单的接口（例如HPE）才能从实践中受益于高级硬件功能。

著录项

期刊名称 other
作者
Javier Cabezas; Isaac Gelado; John E. Stone; Nacho Navarro; David B. Kirk; Wen-mei Hwu;
展开▼
作者单位

展开▼
年(卷),期 -1(26),5
年度 -1
页码 1405–1418
总页数 38
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-21 11:19:02

相似文献

外文文献
中文文献
专利

1. Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications [J] . Cabezas Javier, Gelado Isaac, Stone John E., Parallel and Distributed Systems, IEEE Transactions on . 2015,第5期

机译：在多加速器应用程序中进行有效数据交换的运行时和体系结构支持
2. Π-RT: A Runtime Framework to Enable Energy-Efficient Real-Time Robotic Vision Applications on Heterogeneous Architectures [J] . Kshetri Nir Computer . 2021,第4期

机译：π-RT：运行时框架，以在异构架构上实现节能实时机器人视觉应用
3. Data exchange architecture for the development of mobile applications that support eHealth systems interoperability: a case of Tanzania [J] . Frederick Chali, Zaipuna O. Yonah, Khamisi Kalegele International Journal of Advanced Computer Research . 2018,第34期

机译：用于开发支持eHealth系统互操作性的移动应用程序的数据交换体系结构：坦桑尼亚为例
4. Efficient runtime support for parallelizing block structured applications [C] . Agrawal, G., Sussman, . 1994

机译：高效的运行时支持，以并行化块结构化应用程序
5. OS and Runtime Support for Efficiently Managing Cores in Parallel Applications. [D] . Klues, Kevin Alan. 2015

机译：操作系统和运行时支持，可有效管理并行应用程序中的内核。
6. Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale [O] . Muhuan Huang, Di Wu, Cody Hao Yu, -1

机译：数据中心规模的Blaze FPGA加速器部署的编程和运行时支持
7. Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications [O] . Javier Cabezas, Isaac Gelado, John E. Stone, 2015

机译：运行时和架构支持多加速器应用中的高效数据交换

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

摘要

著录项

相似文献

相关主题

期刊订阅