Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

机译：在图形处理单元上的神经网络中加速稀疏矩阵运算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) are commonly used to train and evaluate neural networks efficiently. While previous work in deep learning has focused on accelerating operations on dense matrices/tensors on GPUs, efforts have concentrated on operations involving sparse data structures. Operations using sparse structures are common in natural language models at the input and output layers, because these models operate on sequences over discrete alphabets. We present two new GPU algorithms: one at the input layer, for multiplying a matrix by a few-hot vector (generalizing the more common operation of multiplication by a one-hot vector) and one at the output layer, for a fused softmax and top-N selection (commonly used in beam search). Our methods achieve speedups over state-of-the-art parallel GPU baselines of up to 7x and 50x, respectively. We also illustrate how our methods scale on different GPU architectures.

机译：图形处理单元（GPU）通常用于有效地训练和评估神经网络。虽然以前的深度学习工作专注于加速GPU上的密集矩阵/张量上的运算，但工作重点却集中在涉及稀疏数据结构的运算上。使用稀疏结构的操作在输入和输出层的自然语言模型中很常见，因为这些模型对离散字母上的序列进行操作。我们提出了两种新的GPU算法：一种在输入层，用于将矩阵乘以几个热向量（将一个热向量乘以更常见的乘法运算），另一种在输出层，用于融合softmax和top-N选择（通常在波束搜索中使用）。我们的方法在最先进的并行GPU基准上分别实现了高达7倍和50倍的加速。我们还将说明我们的方法如何在不同的GPU架构上扩展。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|6215-6224|共10页
会议地点
作者
Arturo Argueta; David Chiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
2. Accelerating FCM neural network classifier using graphics processing units with CUDA [J] . LinWang, Bo Yang, Yuehui Chen, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2014,第1期

机译：使用带有CUDA的图形处理单元加速FCM神经网络分类器
3. Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches [J] . Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, Journal of Applied Remote Sensing . 2011,第Null期

机译：加速图形处理单元上的RTTOV-7 IASI和AMSU-A辐射传递模型：评估中央处理单元/图形处理单元-混合和纯图形处理单元方法
4. Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units [C] . Arturo Argueta, David Chiang Annual meeting of the Association for Computational Linguistics . 2019

机译：在图形处理单元上加速神经网络中的稀疏矩阵操作
5. Sparsity-centric Optimization for Neural Networks on Modern Graphics Processing Units: Algorithmic and Architectural Perspective [D] . Zhu, Maohua . 2020

机译：现代图形处理单元上的神经网络中心优化：算法和建筑视角
6. Estimating numerical error in neural network simulations on Graphics Processing Units [O] . James P Turner, Thomas Nowotny 2015

机译：在图形处理单元上的神经网络仿真中估计数值误差
7. Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units [O] . Arturo Argueta, David Chiang 2019

机译：在图形处理单元上加速神经网络中的稀疏矩阵操作
8. Hierarchical Neural Network Based Data Processing System for Ground- Penetrating Radar. An End of Year Report for CH/1049/6: Application of Neural Networks Coupled With Genetic Algorithms to Optimize Soil Cleanup Operations in Cold Climates [R] . Sullivan, J. M. 1997

机译：基于分层神经网络的探地雷达数据处理系统。 CH / 1049/6的年终报告：神经网络与遗传算法相结合的应用，以优化寒冷气候下的土壤清理作业

Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

摘要

著录项

相似文献

相关主题

期刊订阅