首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >TASO: Time and Space Optimization for Memory-Constrained DNN Inference
【24h】

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

机译:TASO:内存受限DNN推理的时间和空间优化

获取原文

摘要

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate work space that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNetand SqueezeNet) on the ARM Cortex-A15 yields speedups of 8× compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2× while sacrificing only 15% of inference time compared to a solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.
机译:卷积神经网络(CNN)被用于许多嵌入式应用中,从工业机器人技术和自动化系统到移动设备上的生物识别。最先进的分类通常是通过大型网络实现的,大型网络在内存和能源预算受到严格限制的情况下在移动和嵌入式设备上运行非常昂贵。我们提出了一种基于整数线性规划(ILP)的CNN模型的提前域特定优化方法,该方法用于选择原始操作以实现卷积层。我们通过以下方法来优化执行时间和内存消耗之间的权衡:1)尝试通过选择数据布局和原始操作以实现每一层来最小化整个网络的执行时间; 2)分配适当的工作空间,以反映每层内存占用量的上限。这两种优化策略可用于在具有C编译器的任何平台上运行任何CNN。与基于贪婪算法的原始选择相比,我们对ARM Cortex-A15上的一系列流行ImageNet神经体系结构(GoogleNet,AlexNet,VGG,ResNet和SqueezeNet)的评估得出的速度提高了8倍,将内存需求减少了2.2倍,而仅牺牲了15倍与仅考虑推理时间的求解器相比,推理时间的百分比。此外,我们的优化方法在内存的Pareto边界和延迟权衡之间提供了针对不同配置的一系列最佳点,这些最佳点可在任意系统约束下使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号