首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures
【24h】

FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures

机译:FeatherCNN:使用TensorGEMM在ARM体系结构上进行快速推理计算

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Deep Learning is ubiquitous in a wide field of applications ranging from research to industry. In comparison to time-consuming iterative training of convolutional neural networks (CNNs), inference is a relatively lightweight operation making it amenable to execution on mobile devices. Nevertheless, lower latency and higher computation efficiency are crucial to allow for complex models and prolonged battery life. Addressing the aforementioned challenges, we propose FeatherCNN - a fast inference library for ARM CPUs - targeting the performance ceiling of mobile devices. FeatherCNN employs three key techniques: 1) A highly efficient TensorGEMM (generalized matrix multiplication) routine is applied to accelerate Winograd convolution on ARM CPUs, 2) General layer optimization based on custom high performance kernels improves both the computational efficiency and locality of memory access patterns for non-Winograd layers. 3) The framework design emphasizes joint layer-wise optimization using layer fusion to remove redundant calculations and memory movements. Performance evaluation reveals that FeatherCNN significantly outperforms state-of-the-art libraries. A forward propagation pass of VGG-16 on a 64-core ARM server is 48, 14, and 12 times faster than Caffe using OpenBLAS, Caffe2 using Eigen, and NNPACK, respectively. In addition, FeatherCNN is 3.19 times faster than the recently released TensorFlow Lite library on an iPhone 7 plus. In terms of GEMM performance, FeatherCNN achieves 14.8 and 39.0 percent higher performance than Apples Accelerate framework on an iPhone 7 plus and Eigen on a Samsung Galaxy S8, respectively. The source code of FeatherCNN library is publicly available at https://github.com/tencent/feathercnn.
机译:深度学习在从研究到工业的广泛应用领域中无处不在。与费时的卷积神经网络迭代训练相比,推理是一种相对轻量级的操作,使其可以在移动设备上执行。尽管如此,较低的延迟和较高的计算效率对于允许使用复杂的模型和延长电池寿命至关重要。为了应对上述挑战,我们提出了FeatherCNN(一种针对ARM CPU的快速推理库),其目标是移动设备的性能上限。 FeatherCNN采用三种关键技术:1)应用高效的TensorGEMM(通用矩阵乘法)例程来加速ARM CPU上的Winograd卷积; 2)基于定制高性能内核的常规层优化同时提高了计算效率和内存访问模式的局部性用于非Winograd图层。 3)框架设计强调使用层融合来消除冗余计算和内存移动的联合分层优化。性能评估表明,FeatherCNN明显优于最新的库。 VGG-16在64核ARM服务器上的前向传播通道分别比使用OpenBLAS的Caffe,使用Eigen的Caffe2和NNPACK快48倍,14倍和12倍。此外,FeatherCNN比iPhone 7 plus上最近发布的TensorFlow Lite库快3.19倍。就GEMM性能而言,FeatherCNN的性能分别比iPhone 7 plus上的Apples Accelerate框架和三星Galaxy S8上的Eigen高出14.8%和39.0%。 FeatherCNN库的源代码可从https://github.com/tencent/feathercnn公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号