首页> 外文会议>IEEE International Conference on Communication Technology >An Efficient Sparse-Dense Matrix Multiplication on a Multicore System
【24h】

An Efficient Sparse-Dense Matrix Multiplication on a Multicore System

机译:多核系统上有效的稀疏密集矩阵乘法

获取原文

摘要

Deep Neural Network (DNN) is currently widely used in various applications, such as speech recognition, computer vision, etc. The computation kernel of DNN-based applications is large sparse-dense matrix multiplication. As the performance of existing methods and software libraries for sparse matrix multiplication is not as good as expected, real-time recognition process has not been achieved yet. Therefore, we propose a novel sparse matrix storage format, block-based CSR (compressed storage format) and COO (coordinate format), called BCSR&BCOO, and a thread-scalable computing kernel for sparse-dense matrix multiplication, called BSpMM. We evaluate the performance of our proposed data structure and computing kernel in a real application in DNN-based online speech recognition. The experimental results demonstrate up to 4x speedup over Intel MKL on a typical CPU-based multicore system. Significant improvement in FLOPS is observed as well.
机译:深度神经网络(DNN)目前广泛用于各种应用,例如语音识别,计算机视觉等。基于DNN的应用的计算内核是大的稀疏密集矩阵乘法。随着稀疏矩阵乘法的现有方法和软件库的性能并不像预期的那样好,尚未实现实时识别过程。因此,我们提出了一种新颖的稀疏矩阵存储格式,基于块的CSR(压缩存储格式)和COO(坐标格式),称为BCSR&BCOO,以及用于稀疏密集矩阵乘法的线程可伸缩的计算内核,称为BSPMM。我们在基于DNN的在线语音识别中的实际应用中评估我们所提出的数据结构和计算内核的性能。实验结果在典型的CPU基于多核系统上展示了Intel MKL的4倍。也观察到絮凝物的显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号