Parallel Multi Channel convolution using General Matrix Multiplication

机译：使用通用矩阵乘法的并行多通道卷积

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally-intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.

机译：卷积神经网络（CNN）已经成为用于图像和视频处理的最成功的机器学习技术之一。卷积神经网络中计算量最大的部分是卷积层，该卷积层将具有多个内核的多通道图像卷积在一起。实现卷积层的常用方法是将图像扩展为列矩阵（im2col），并使用现有的并行通用矩阵乘法（GEMM）库执行多通道多内核（MCMK）卷积。这种im2col转换大大增加了输入矩阵的内存占用量，并减少了数据局部性。在本文中，我们提出了一种基于通用矩阵乘法（GEMM）而不是基于im2col的MCMK卷积新方法。我们的算法消除了对输入进行数据复制的需要，从而使我们能够将卷积核直接应用于输入图像。我们已经在CPU处理器和嵌入式ARM处理器上实现了算法的多种变体。在CPU上，大多数情况下我们的算法比im2col更快。

著录项

来源
《International Conference on Application-specific Systems, Architectures and Processors》|2017年|19-24|共6页
会议地点
作者
Aravind Vasudevan; Andrew Anderson; David Gregg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Kernel; Convolution; Matrix converters; Memory management; Two dimensional displays; Neural networks; Computer science;

机译：内核;卷积;矩阵转换器;内存管理;二维显示;神经网络;计算机科学;

相似文献

外文文献
中文文献
专利

1. A parallel hardware-oriented algorithm for constant matrix-vector multiplication with reduced multiplicative complexity [J] . Aleksandr CARIOW, Galina CARIOWA Pomiary Automatyka Kontrola . 2014,第7期

机译：面向并行硬件的恒定矩阵向量乘法算法，可降低乘法复杂度
2. Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms [J] . Tomonori Kouya JSIAM Letters . 2016,第2期

机译：使用并行Strassen和Winograd算法的多精度矩阵乘法性能评估
3. Parallelized Matrix Multiplications for the Multi-Core CPU's [J] . Nakhoon Baek, Hwanyong Lee WSEAS Transactions on Computers . 2007,第12期

机译：多核CPU的并行矩阵乘法
4. Parallel Multi Channel convolution using General Matrix Multiplication [C] . Aravind Vasudevan, Andrew Anderson, David Gregg IEEE International Conference on Application-specific Systems, Architectures and Processors . 2017

机译：使用常规矩阵乘法的并行多通道卷积
5. Fast space-varying convolution in stray light reduction, fast matrix vector multiplication using the sparse matrix transform, and activation detection in fMRI data analysis. [D] . Wei, Jianing. 2010

机译：快速减少杂散光的空间变化卷积，使用稀疏矩阵变换的快速矩阵向量乘法以及fMRI数据分析中的激活检测。
6. Quantum hyperparallel algorithm for matrix multiplication [O] . Xin-Ding Zhang, Xiao-Ming Zhang, Zheng-Yuan Xue -1

机译：量子超并行矩阵乘法算法
7. Parallel Multi Channel Convolution using General Matrix Multiplication [O] . Vasudevan, Aravind, Anderson, Andrew, Gregg, David 2017

机译：基于通用矩阵乘法的并行多通道卷积

Parallel Multi Channel convolution using General Matrix Multiplication

摘要

著录项

相似文献

相关主题

期刊订阅