首页> 外文会议>International Conference on Application-specific Systems, Architectures and Processors >Parallel Multi Channel convolution using General Matrix Multiplication
【24h】

Parallel Multi Channel convolution using General Matrix Multiplication

机译:使用通用矩阵乘法的并行多通道卷积

获取原文

摘要

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally-intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.
机译:卷积神经网络(CNN)已经成为用于图像和视频处理的最成功的机器学习技术之一。卷积神经网络中计算量最大的部分是卷积层,该卷积层将具有多个内核的多通道图像卷积在一起。实现卷积层的常用方法是将图像扩展为列矩阵(im2col),并使用现有的并行通用矩阵乘法(GEMM)库执行多通道多内核(MCMK)卷积。这种im2col转换大大增加了输入矩阵的内存占用量,并减少了数据局部性。在本文中,我们提出了一种基于通用矩阵乘法(GEMM)而不是基于im2col的MCMK卷积新方法。我们的算法消除了对输入进行数据复制的需要,从而使我们能够将卷积核直接应用于输入图像。我们已经在CPU处理器和嵌入式ARM处理器上实现了算法的多种变体。在CPU上,大多数情况下我们的算法比im2col更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号