TASO: Time and Space Optimization for Memory-Constrained DNN Inference

机译：TASO：内存受限DNN推理的时间和空间优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate work space that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNetand SqueezeNet) on the ARM Cortex-A15 yields speedups of 8× compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2× while sacrificing only 15% of inference time compared to a solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.

机译：卷积神经网络（CNN）被用于许多嵌入式应用中，从工业机器人技术和自动化系统到移动设备上的生物识别。最先进的分类通常是通过大型网络实现的，大型网络在内存和能源预算受到严格限制的情况下在移动和嵌入式设备上运行非常昂贵。我们提出了一种基于整数线性规划（ILP）的CNN模型的提前域特定优化方法，该方法用于选择原始操作以实现卷积层。我们通过以下方法来优化执行时间和内存消耗之间的权衡：1）尝试通过选择数据布局和原始操作以实现每一层来最小化整个网络的执行时间; 2）分配适当的工作空间，以反映每层内存占用量的上限。这两种优化策略可用于在具有C编译器的任何平台上运行任何CNN。与基于贪婪算法的原始选择相比，我们对ARM Cortex-A15上的一系列流行ImageNet神经体系结构（GoogleNet，AlexNet，VGG，ResNet和SqueezeNet）的评估得出的速度提高了8倍，将内存需求减少了2.2倍，而仅牺牲了15倍与仅考虑推理时间的求解器相比，推理时间的百分比。此外，我们的优化方法在内存的Pareto边界和延迟权衡之间提供了针对不同配置的一系列最佳点，这些最佳点可在任意系统约束下使用。

著录项

来源
《IEEE International Symposium on Computer Architecture and High Performance Computing》|2020年|199-208|共10页
会议地点
作者
Yuan Wen; Andrew Anderson; Valentin Radu; Michael F.P. OBoyle; David Gregg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
neural network optimization; computing operators; primitive selection; optimal convolutional layer; memory-time trade-off;

机译：神经网络优化;计算运算符;基本选择最佳卷积层记忆时间的权衡;

相似文献

外文文献
中文文献
专利

1. Low-thrust spacecraft trajectory optimization via a DNN-based method [J] . Shanshan Yin, Jian Li, Lin Cheng Advances in space research . 2020,第7期

机译：通过基于DNN的方法的低推力航天器轨迹优化
2. Model predictive control with non-uniformly spaced optimization horizon for multi-timescale processes [J] . Chee Keong Tan, Michael James Tippett, Jie Bao Computers & Chemical Engineering . 2016,第JANa4期

机译：具有非均匀间隔优化地平线的多时间尺度过程的模型预测控制
3. Real-time exergoeconomic optimization of a steam power plant using a soft computing-fuzzy inference system [J] . Baghsheikhi Mostafa, Sayyaadi Hoseyn Energy . 2016,第nova1期

机译：使用软计算-模糊推理系统的蒸汽电厂实时能效优化
4. Optimizing OpenCL Kernels and Runtime for DNN Inference on FPGAs [C] . Seung-Hun Chung, Tarek S. Abdelrahman IEEE International Parallel and Distributed Processing Symposium Workshops . 2020

机译：针对FPGA上的DNN推理优化OpenCL内核和运行时
5. Real-Time Optimization in Aerospace Systems [D] . Dueri, Daniel. 2018

机译：航空航天系统中的实时优化
6. A generalization of Fatou’s lemma for extended real-valued functions on σ-finite measure spaces: with an application to infinite-horizon optimization in discrete time [O] . Takashi Kamihigashi -1

机译：σ有限度量空间上扩展实值函数的Fatou引理的推广：应用于离散时间的无限水平优化
7. TASO: Time and Space Optimization for Memory-Constrained DNN Inference [O] . Yuan Wen, Andrew Anderson, Valentin Radu, 2020

机译：Taso：内存受限DNN推理的时间和空间优化

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

摘要

著录项

相似文献

相关主题

期刊订阅