An efficient sparse-dense matrix multiplication on a multicore system

机译：多核系统上的有效稀疏-密集矩阵乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep Neural Network (DNN) is currently widely used in various applications, such as speech recognition, computer vision, etc. The computation kernel of DNN-based applications is large sparse-dense matrix multiplication. As the performance of existing methods and software libraries for sparse matrix multiplication is not as good as expected, real-time recognition process has not been achieved yet. Therefore, we propose a novel sparse matrix storage format, block-based CSR (compressed storage format) and COO (coordinate format), called BCSR&BCOO, and a thread-scalable computing kernel for sparse-dense matrix multiplication, called BSpMM. We evaluate the performance of our proposed data structure and computing kernel in a real application in DNN-based online speech recognition. The experimental results demonstrate up to 4x speedup over Intel MKL on a typical CPU-based multicore system. Significant improvement in FLOPS is observed as well.

机译：深度神经网络（DNN）当前广泛用于各种应用程序中，例如语音识别，计算机视觉等。基于DNN的应用程序的计算内核是大型稀疏-密集矩阵乘法。由于用于稀疏矩阵乘法的现有方法和软件库的性能不如预期的好，因此尚未实现实时识别过程。因此，我们提出了一种新颖的稀疏矩阵存储格式，称为BCSR＆BCOO的基于块的CSR（压缩存储格式）和COO（坐标格式），以及一种用于稀疏密集矩阵乘法的线程可伸缩计算内核，称为BSpMM。我们评估了我们提出的数据结构和计算内核在基于DNN的在线语音识别中的实际应用中的性能。实验结果证明，在典型的基于CPU的多核系统上，速度是Intel MKL的4倍。还观察到FLOPS的显着改善。

著录项

来源
《》|2017年|1880-1883|共4页
会议地点
作者
Di Yan; Tao Wu; Ying Liu; Yang Gao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparse matrices; Kernel; Neural networks; Multicore processing; Speech recognition; Machine learning;

机译：稀疏矩阵;内核;神经网络;多核处理;语音识别;机器学习;

相似文献

外文文献
中文文献
专利

1. Efficient Multicore Sparse Matrix-Vector Multiplication for FE Electromagnetics [J] . Fernandez D. M., Giannacopoulos D., Gross W. J. Magnetics, IEEE Transactions on . 2009,第3期

机译：有限元电磁的高效多核稀疏矩阵矢量乘法
2. HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS [J] . GERALD SCHUBERT HOLGER FEHSKEa GEORG HAGER GERHARD WELLEIN b Parallel Processing Letters . 2011,第3期

机译：电流多核系统上具有显式通信重叠的混合并行稀疏矩阵-矢量乘法
3. HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS [J] . GERALD SCHUBERT, HOLGER FEHSKE, GEORG HAGER, Parallel Processing Letters . 2011,第3期

机译：电流多核系统上具有显式通信重叠的混合并行稀疏矩阵-向量乘法
4. An Efficient Sparse-Dense Matrix Multiplication on a Multicore System [C] . Di Yan, Ying Liu, Tao Wu, IEEE International Conference on Communication Technology . 2017

机译：多核系统上有效的稀疏密集矩阵乘法
5. NUMA-aware multicore Matrix Multiplication. [D] . Alkowaileet, Wail Yousef. 2013

机译：NUMA感知的多核矩阵乘法。
6. Efficient regeneration system for rapid multiplication of clean planting material of Ensete ventricosum (Welw.) Cheesman [O] . Jaindra Tripathi, Jonathan Matheka, Ibsa Merga, -1

机译：高效再生系统用于快速繁殖Ensete ventricosum（Welw。）Cheesman的干净种植材料
7. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format [O] . Shaohuai Shi, Qiang Wang, Xiaowen Chu 2020

机译：使用定制稀疏存储格式的高效稀疏密集矩阵矩阵乘法

An efficient sparse-dense matrix multiplication on a multicore system

摘要

著录项

相似文献

相关主题

期刊订阅