High Performance and Portable Convolution Operators for Multicore Processors

机译：适用于多核处理器的高性能和便携式卷积运算符

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.

机译：卷积神经网络对许多人工智能任务的巨大影响导致针对这种类型的网络中存在的卷积算子开发了各种高性能算法。这些方法之一是利用IM2COL变换，然后使用通用矩阵乘法（GEMM），以便利用许多线性代数库中GEMM内核的高度优化实现。这种方法的主要问题是：1）托管由IM2COL转换生成的中间矩阵所需的大内存工作区; 2）执行IM2COL转换的时间，这对于复杂的神经网络而言不可忽略。本文提出了一种基于GEMM内核的BLIS实现的便携式高性能卷积算法，该算法通过利用BLIS结构避免了使用中间存储器。此外，所提出的算法消除了显式IM2COL转换的成本，同时保持了BLIS中GEMM底层实现的可移植性和性能。

著录项

来源
《IEEE International Symposium on Computer Architecture and High Performance Computing》|2020年|91-98|共8页
会议地点
作者
Pablo San Juan; Adrián Castelló; Manuel F. Dolz; Pedro Alonso-Jordá; Enrique S. Quintana-Ortí;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Convolutional neural networks, high performance, multicore processors;

机译：卷积神经网络，高性能，多核处理器;

相似文献

外文文献
中文文献
专利

1. A Performance and Energy Comparison of Convolution on GPUs, FPGAs, and Multicore Processors [J] . JEREMY FOWERS, GREG BROWN, JOHN WERNSING, ACM Transactions on Architecture and Code Optimization . 2012,第4期

机译：GPU，FPGA和多核处理器上卷积的性能和能量比较
2. Solve Portable Design Problems Using Convenient Concurrency: SMP multicore processors offer many advantages in portable products-if they're properly designed [J] . Grant Martin, Steve Leibson Portable Design: The Engineer's Resource for Portable Applications . 2008,第2期

机译：使用便捷的并发解决便携式设计问题：如果SMP多核处理器设计合理，则它们在便携式产品中具有许多优势
3. Robust Real-Time Embedded EMG Recognition Framework Using Temporal Convolutional Networks on a Multicore IoT Processor [J] . Zanghieri Marcello, Benatti Simone, Burrello Alessio, IEEE transactions on biomedical circuits and systems . 2020,第2期

机译：使用时间卷积网络在多核IOT处理器上使用时间卷积网络的强大实时嵌入式EMG识别框架
4. PVTOL: Providing Productivity, Performance and Portability to DoD Signal Processing Applications on Multicore Processors [C] . DoD High Performance Computing Modernization Program Users Group Conference . 2009

机译：PVTOL：为多核处理器上的DOD信号处理应用提供生产力，性能和可移植性
5. Performance-correctness challenges in emerging heterogeneous multicore processors [D] . Mekkat, Vineeth Thamarassery 2013

机译：新兴的异构多核处理器中的性能正确性挑战
6. Towards Portable Large-Scale Image Processing with High-Performance Computing [O] . Yuankai Huo, Justin Blaber, Stephen M. Damon, 2018

机译：借助高性能计算实现便携式大规模图像处理
7. High Performance and Portable Convolution Operators for Multicore Processors [O] . Pablo San Juan, Adrian Castello, Manuel F. Dolz, 2020

机译：用于多核处理器的高性能和便携式卷积运算符

High Performance and Portable Convolution Operators for Multicore Processors

摘要

著录项

相似文献

相关主题

期刊订阅