Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

机译：在ARMv8多核CPU上优化基于FFT的卷积

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional Neural Networks (CNNs) are widely applied in various machine learning applications and very time-consuming. Most of CNNs' execution time is consumed by convolutional layers. A common approach to implementing convolutions is the FFT-based one, which can reduce the arithmetic complexity of convolutions without losing too much precision. As the performance of ARMv8 multi-core CPUs improves, they can also be utilized to perform CNNs like Intel X86 CPUs. In this paper, we present a new parallel FFT-based convolution implementation on ARMv8 multi-core CPUs. The implementation makes efficient use of ARMv8 multi-core CPUs through a series of computation and memory optimizations. The experiment results on two ARMv8 multi-core CPUs demonstrate that our new implementation gives much better performance than two existing approaches in most cases.

机译：卷积神经网络（CNN）广泛应用于各种机器学习应用程序中，并且非常耗时。卷积层消耗了大部分CNN的执行时间。一种常见的实现卷积的方法是基于FFT的方法，它可以降低卷积的算术复杂度，而又不会损失太多的精度。随着ARMv8多核CPU的性能提高，它们也可以用于执行CNN，如Intel X86 CPU。在本文中，我们提出了一种在ARMv8多核CPU上基于并行FFT的新卷积实现。该实现通过一系列计算和内存优化有效利用了ARMv8多核CPU。在两个ARMv8多核CPU上的实验结果表明，在大多数情况下，我们的新实现提供了比两种现有方法更好的性能。

著录项

来源
《International European Conference on Parallel and Distributed Computing》|2020年|248-262|共15页
会议地点
作者
Qinglin Wang; Dongsheng Li; Xiandong Huang; Siqi Shen; Songzhu Mei; Jie Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
CNNs; Convolution; FFT; ARMv8; Parallel algorithm;

机译：CNN;卷积; FFT; ARMv8;并行算法;

相似文献

外文文献
中文文献
专利

1. Parallel ant colony optimization on multi-core SIMD CPUs [J] . Yi Zhou, Fazhi He, Neng Hou, Future generation computer systems . 2018,第pta2期

机译：多核信德海产品的并行蚁群优化
2. Optimizing Hash Join with MapReduce on Multi-Core CPUs [J] . Tong YUAN, Zhijing LIU, Hui LIU IEICE transactions on information and systems . 2016,第5期

机译：在多核CPU上使用MapReduce优化哈希联接
3. Optimizing image processing on multi-core CPUs with Intel parallel programming technologies [J] . Charles Morgan Computing reviews . 2015,第9期

机译：使用Intel并行编程技术优化多核CPU上的图像处理
4. Optimizing One by One Direct Convolution on ARMv8 Multi-core CPUs [C] . Qinglin Wang, Dongsheng Li, Songzhu Mei, IEEE International Conference on Joint Cloud Computing . 2020

机译：在ARMv8多核CPU上一对一直接卷积优化
5. Optimized Parallel Training of Word Vectors on Multi-Core CPU and GPU [D] . Simonton, Trevor McDonald. 2017

机译：多核CPU和GPU上的单词矢量优化并行培训
6. rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs [O] . Christian Hundt, Andreas Hildebrandt, Bertil Schmidt 2016

机译：quickGSEA：加速多核CPU和支持CUDA的GPU上的基因集富集分析
7. Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers [O] . Hager, Georg, Zeiser, Thomas, Wellein, Gerhard 2008

机译：高度线程化多核CpU的数据访问优化多个内存控制器

Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

摘要

著录项

相似文献

相关主题

期刊订阅