Towards Cross-Platform Performance Portability of DNN Models using SYCL

机译：使用SYCL跨平台模型的跨平台性能可移植性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The incoming deployment of Exascale platforms with a myriad of different architectures and co-processors have prompted the need to provide a software ecosystem based on open standards that can simplify maintaining HPC applications on different hardware. Applications written for a particular platform should be portable to a different one, ensuring performance is as close to the peak as possible. However, it is not expected that key performance routines on relevant HPC applications will be performance portable as is, especially for common building blocks such as BLAS or DNN. The oneAPI the initiative aims to tackle this problem by combining a programming model, SYCL, with a set of interfaces for common building blocks that can be optimized for different hardware vendors. In particular, oneAPI includes the oneDNN performance library, which contains building blocks for deep learning applications and frameworks. By using the SYCL programming model, it can integrate easily with existing SYCL and C++ applications, sharing data and executing collaboratively on devices with the rest of the application. In this paper, we introduce a cuDNN backend for oneDNN, which allows running oneAPI applications on NVIDIA hardware taking advantage of existing building blocks from the CUDA ecosystem. We implement relevant neural networks (ResNet-50 and VGG- 16) on native CUDA and also a version of oneAPI with a CUDA backend, and demonstrate that performance portability can be achieved by leveraging existing building blocks for the target hardware.

机译：ExaScale平台的传入部署与多数不同的架构和协处理器促使需要根据开放标准提供软件生态系统，可以简化在不同硬件上维护HPC应用程序。为特定平台编写的应用程序应携带到不同的平台，确保性能尽可能靠近峰值。但是，预计相关HPC应用程序上的关键性能例程将是性能，尤其适用于诸如BLAS或DNN之类的公共构建块。 ONEAPI主动旨在通过将编程模型Sycl组合使用一组可针对不同的硬件供应商进行优化的通用构建块的一组接口来解决此问题。特别是，ONEAPI包括Onednn性能库，其中包含深度学习应用程序和框架的构建块。通过使用Sycl编程模型，它可以轻松地与现有Sycl和C ++应用程序集成，共享数据和在具有其余应用程序的设备上协作执行。在本文中，我们为Onednn介绍了CUDNN后端，它允许在NVIDIA硬件上运行ONEAPI应用，利用来自CUDA生态系统的现有构建块。我们在原生CUDA上实施相关的神经网络（Reset-50和VGG-16），也是具有CUDA后端的ONEAPI版本，并证明可以通过利用目标硬件的现有构件块来实现性能便携性。

著录项

来源
《International Workshop on Performance, Portability and Productivity in HPC;International Conference for High Performance Computing, Networking, Storage and Analysis》|2020年|25-35|共11页
会议地点
作者
Mehdi Goli; Kumudha Narasimhan; Ruyman Reyes; Ben Tracy; Daniel Soutar; Svetlozar Georgiev; Evarist M Fomenko; Eugene Chereshnev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Libraries; Neural networks; Computer architecture; Hardware; Graphics processing units; Programming; Performance evaluation;

机译：图书馆;神经网络;计算机架构;硬件;图形处理单元;编程;绩效评估;

相似文献

外文文献
中文文献
专利

1. CROSS-PLATFORM PERFORMANCE OF A PORTABLE COMMUNICATION MODULE AND THE NASA FINITE VOLUME GENERAL CIRCULATION MODEL [J] . William M. Putman, Shian-Jiann Lin, Bo-Wen Shen International Journal of High Performance Computing Applications . 2005,第3期

机译：便携式通信模块的跨平台性能和NASA有限体积总循环模型
2. Multidimensional Approach Based on Deep Learning to Improve the Prediction Performance of DNN Models [J] . Mohamed El Fouki, Noura Aknin, Kamal Eddine El Kadiri International Journal of Emerging Technologies in Learning (iJET) . 2019,第2期

机译：基于深度学习改善DNN模型预测性能的多维方法
3. Cross-platform expression microarray performance in a mouse model of mitochondrial disease therapy. [J] . Zhang Z, Gasser DL, Rappaport EF, Molecular genetics and metabolism . 2010,第3期

机译：线粒体疾病治疗小鼠模型中的跨平台表达微阵列性能。
4. Evaluating the Performance and Portability of Contemporary SYCL Implementations [C] . Beau Johnston, Jeffrey S. Vetter, Josh Milthorpe International Workshop on Performance, Portability and Productivity in HPC;International Conference for High Performance Computing, Networking, Storage and Analysis . 2020

机译：评估当代SYCL实现的性能和可移植性
5. Design, modeling and performance of a hybrid portable gamma camera. [D] . Smith, Leon Eric. 1998

机译：混合便携式伽马相机的设计，建模和性能。
6. Cross-platform expression microarray performance in a mouse model of mitochondrial disease therapy [O] . Zhe Zhang, David L. Gasser, Eric F. Rappaport, -1

机译：线粒体疾病治疗的小鼠模型跨平台表达微阵列性能
7. Cross-Platform Performance of a Portable Communication Module and the NASA Finite Volume General Circulation Model [O] . William M. Putman A, Shian-jiann Lin B, Bo-wen Shen C 2015

机译：便携式通信模块的跨平台性能和Nasa有限体积通用循环模型

Towards Cross-Platform Performance Portability of DNN Models using SYCL

摘要

著录项

相似文献

相关主题

期刊订阅