An Efficient Sparse-Dense Matrix Multiplication on a Multicore System

机译：多核系统上有效的稀疏密集矩阵乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep Neural Network (DNN) is currently widely used in various applications, such as speech recognition, computer vision, etc. The computation kernel of DNN-based applications is large sparse-dense matrix multiplication. As the performance of existing methods and software libraries for sparse matrix multiplication is not as good as expected, real-time recognition process has not been achieved yet. Therefore, we propose a novel sparse matrix storage format, block-based CSR (compressed storage format) and COO (coordinate format), called BCSR&BCOO, and a thread-scalable computing kernel for sparse-dense matrix multiplication, called BSpMM. We evaluate the performance of our proposed data structure and computing kernel in a real application in DNN-based online speech recognition. The experimental results demonstrate up to 4x speedup over Intel MKL on a typical CPU-based multicore system. Significant improvement in FLOPS is observed as well.

机译：深度神经网络（DNN）目前广泛用于各种应用，例如语音识别，计算机视觉等。基于DNN的应用的计算内核是大的稀疏密集矩阵乘法。随着稀疏矩阵乘法的现有方法和软件库的性能并不像预期的那样好，尚未实现实时识别过程。因此，我们提出了一种新颖的稀疏矩阵存储格式，基于块的CSR（压缩存储格式）和COO（坐标格式），称为BCSR＆BCOO，以及用于稀疏密集矩阵乘法的线程可伸缩的计算内核，称为BSPMM。我们在基于DNN的在线语音识别中的实际应用中评估我们所提出的数据结构和计算内核的性能。实验结果在典型的CPU基于多核系统上展示了Intel MKL的4倍。也观察到絮凝物的显着改善。

著录项

来源
《IEEE International Conference on Communication Technology》|2017年|1365-2045p|共4页
会议地点
作者
Di Yan; Ying Liu; Tao Wu; Yang Gao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN91-53;
关键词
Sparse matrix multiplication; Multicore; Parallel computing; Performance tuning; Deep neural network;

机译：稀疏矩阵乘法;多核;并行计算;性能调整;深神经网络;

相似文献

外文文献
中文文献
专利

1. Efficient Multicore Sparse Matrix-Vector Multiplication for FE Electromagnetics [J] . Fernandez D. M., Giannacopoulos D., Gross W. J. Magnetics, IEEE Transactions on . 2009,第3期

机译：有限元电磁的高效多核稀疏矩阵矢量乘法
2. HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS [J] . GERALD SCHUBERT HOLGER FEHSKEa GEORG HAGER GERHARD WELLEIN b Parallel Processing Letters . 2011,第3期

机译：电流多核系统上具有显式通信重叠的混合并行稀疏矩阵-矢量乘法
3. HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS [J] . GERALD SCHUBERT, HOLGER FEHSKE, GEORG HAGER, Parallel Processing Letters . 2011,第3期

机译：电流多核系统上具有显式通信重叠的混合并行稀疏矩阵-向量乘法
4. An efficient sparse-dense matrix multiplication on a multicore system [C] . Di Yan, Tao Wu, Ying Liu, . 2017

机译：多核系统上的有效稀疏-密集矩阵乘法
5. NUMA-aware multicore Matrix Multiplication. [D] . Alkowaileet, Wail Yousef. 2013

机译：NUMA感知的多核矩阵乘法。
6. Efficient regeneration system for rapid multiplication of clean planting material of Ensete ventricosum (Welw.) Cheesman [O] . Jaindra Tripathi, Jonathan Matheka, Ibsa Merga, -1

机译：高效再生系统用于快速繁殖Ensete ventricosum（Welw。）Cheesman的干净种植材料
7. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format [O] . Shaohuai Shi, Qiang Wang, Xiaowen Chu 2020

机译：使用定制稀疏存储格式的高效稀疏密集矩阵矩阵乘法

An Efficient Sparse-Dense Matrix Multiplication on a Multicore System

摘要

著录项

相似文献

相关主题

期刊订阅