Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

机译：在图形处理单元上加速神经网络中的稀疏矩阵操作

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) are commonly used to train and evaluate neural networks efficiently. While previous work in deep learning has focused on accelerating operations on dense matrices/tensors on GPUs, efforts have concentrated on operations involving sparse data structures. Operations using sparse structures are common in natural language models at the input and output layers, because these models operate on sequences over discrete alphabets. We present two new GPU algorithms: one at the input layer, for multiplying a matrix by a few-hot vector (generalizing the more common operation of multiplication by a one-hot vector) and one at the output layer, for a fused softmax and top-N selection (commonly used in beam search). Our methods achieve speedups over state-of-the-art parallel GPU baselines of up to 7x and 50x, respectively. We also illustrate how our methods scale on different GPU architectures.

机译：图形处理单元（GPU）通常用于有效地培训和评估神经网络。虽然以前在深度学习的工作侧重于加速在GPU上的密集矩阵/张量的操作，但努力集中在涉及稀疏数据结构的操作。使用稀疏结构的操作在输入和输出层的自然语言模型中是常见的，因为这些模型在离散字母表上序列进行操作。我们在输入层处呈现两个新的GPU算法，用于将矩阵乘以几个热的向量（通过一个热向量乘以一个热向量的乘法的更常见操作），并且在输出层中，用于融合软邮件和TOP-N选择（常用于波束搜索）。我们的方法分别实现了高达7x和50倍的最先进的平行GPU基线的加速。我们还说明了我们的方法如何在不同的GPU架构上规模。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|cxxxiv p. 5926-6603|共10页
会议地点
作者
Arturo Argueta; David Chiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
2. Accelerating FCM neural network classifier using graphics processing units with CUDA [J] . LinWang, Bo Yang, Yuehui Chen, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2014,第1期

机译：使用带有CUDA的图形处理单元加速FCM神经网络分类器
3. Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches [J] . Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, Journal of Applied Remote Sensing . 2011,第Null期

机译：加速图形处理单元上的RTTOV-7 IASI和AMSU-A辐射传递模型：评估中央处理单元/图形处理单元-混合和纯图形处理单元方法
4. Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units [C] . Arturo Argueta, David Chiang Annual meeting of the Association for Computational Linguistics . 2019

机译：在图形处理单元上的神经网络中加速稀疏矩阵运算
5. Sparsity-centric Optimization for Neural Networks on Modern Graphics Processing Units: Algorithmic and Architectural Perspective [D] . Zhu, Maohua . 2020

机译：现代图形处理单元上的神经网络中心优化：算法和建筑视角
6. Estimating numerical error in neural network simulations on Graphics Processing Units [O] . James P Turner, Thomas Nowotny 2015

机译：在图形处理单元上的神经网络仿真中估计数值误差
7. Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units [O] . Arturo Argueta, David Chiang 2019

机译：在图形处理单元上加速神经网络中的稀疏矩阵操作
8. Hierarchical Neural Network Based Data Processing System for Ground- Penetrating Radar. An End of Year Report for CH/1049/6: Application of Neural Networks Coupled With Genetic Algorithms to Optimize Soil Cleanup Operations in Cold Climates [R] . Sullivan, J. M. 1997

机译：基于分层神经网络的探地雷达数据处理系统。 CH / 1049/6的年终报告：神经网络与遗传算法相结合的应用，以优化寒冷气候下的土壤清理作业

Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

摘要

著录项

相似文献

相关主题

期刊订阅