首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >High Performance and Portable Convolution Operators for Multicore Processors
【24h】

High Performance and Portable Convolution Operators for Multicore Processors

机译:适用于多核处理器的高性能和便携式卷积运算符

获取原文

摘要

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
机译:卷积神经网络对许多人工智能任务的巨大影响导致针对这种类型的网络中存在的卷积算子开发了各种高性能算法。这些方法之一是利用IM2COL变换,然后使用通用矩阵乘法(GEMM),以便利用许多线性代数库中GEMM内核的高度优化实现。这种方法的主要问题是:1)托管由IM2COL转换生成的中间矩阵所需的大内存工作区; 2)执行IM2COL转换的时间,这对于复杂的神经网络而言不可忽略。本文提出了一种基于GEMM内核的BLIS实现的便携式高性能卷积算法,该算法通过利用BLIS结构避免了使用中间存储器。此外,所提出的算法消除了显式IM2COL转换的成本,同时保持了BLIS中GEMM底层实现的可移植性和性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号