首页> 外文会议> >An efficient sparse-dense matrix multiplication on a multicore system
【24h】

An efficient sparse-dense matrix multiplication on a multicore system

机译:多核系统上的有效稀疏-密集矩阵乘法

获取原文

摘要

Deep Neural Network (DNN) is currently widely used in various applications, such as speech recognition, computer vision, etc. The computation kernel of DNN-based applications is large sparse-dense matrix multiplication. As the performance of existing methods and software libraries for sparse matrix multiplication is not as good as expected, real-time recognition process has not been achieved yet. Therefore, we propose a novel sparse matrix storage format, block-based CSR (compressed storage format) and COO (coordinate format), called BCSR&BCOO, and a thread-scalable computing kernel for sparse-dense matrix multiplication, called BSpMM. We evaluate the performance of our proposed data structure and computing kernel in a real application in DNN-based online speech recognition. The experimental results demonstrate up to 4x speedup over Intel MKL on a typical CPU-based multicore system. Significant improvement in FLOPS is observed as well.
机译:深度神经网络(DNN)当前广泛用于各种应用程序中,例如语音识别,计算机视觉等。基于DNN的应用程序的计算内核是大型稀疏-密集矩阵乘法。由于用于稀疏矩阵乘法的现有方法和软件库的性能不如预期的好,因此尚未实现实时识别过程。因此,我们提出了一种新颖的稀疏矩阵存储格式,称为BCSR&BCOO的基于块的CSR(压缩存储格式)和COO(坐标格式),以及一种用于稀疏密集矩阵乘法的线程可伸缩计算内核,称为BSpMM。我们评估了我们提出的数据结构和计算内核在基于DNN的在线语音识别中的实际应用中的性能。实验结果证明,在典型的基于CPU的多核系统上,速度是Intel MKL的4倍。还观察到FLOPS的显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号