首页> 外文会议>International European Conference on Parallel and Distributed Computing >Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs
【24h】

Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

机译:在ARMv8多核CPU上优化基于FFT的卷积

获取原文

摘要

Convolutional Neural Networks (CNNs) are widely applied in various machine learning applications and very time-consuming. Most of CNNs' execution time is consumed by convolutional layers. A common approach to implementing convolutions is the FFT-based one, which can reduce the arithmetic complexity of convolutions without losing too much precision. As the performance of ARMv8 multi-core CPUs improves, they can also be utilized to perform CNNs like Intel X86 CPUs. In this paper, we present a new parallel FFT-based convolution implementation on ARMv8 multi-core CPUs. The implementation makes efficient use of ARMv8 multi-core CPUs through a series of computation and memory optimizations. The experiment results on two ARMv8 multi-core CPUs demonstrate that our new implementation gives much better performance than two existing approaches in most cases.
机译:卷积神经网络(CNN)广泛应用于各种机器学习应用程序中,并且非常耗时。卷积层消耗了大部分CNN的执行时间。一种常见的实现卷积的方法是基于FFT的方法,它可以降低卷积的算术复杂度,而又不会损失太多的精度。随着ARMv8多核CPU的性能提高,它们也可以用于执行CNN,如Intel X86 CPU。在本文中,我们提出了一种在ARMv8多核CPU上基于并行FFT的新卷积实现。该实现通过一系列计算和内存优化有效利用了ARMv8多核CPU。在两个ARMv8多核CPU上的实验结果表明,在大多数情况下,我们的新实现提供了比两种现有方法更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号