...
首页> 外文期刊>Computer architecture news >Optimizing CNNs on Multicores for Scalability, Performance and Goodput
【24h】

Optimizing CNNs on Multicores for Scalability, Performance and Goodput

机译:针对可扩展性,性能和吞吐量优化多核上的CNN

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Convolutional Neural Networks (CNN) are a class of Artificial Neural Networks (ANN) that are highly efficient at the pattern recognition tasks that underlie difficult AI problems in a variety of domains, such as speech recognition, object recognition, and natural language processing. CNNs are, however, computationally intensive to train. This paper presents the first characterization of the performance optimization opportunities for training CNNs on CPUs. Our characterization includes insights based on the structure of the network itself (i.e., intrinsic arithmetic intensity of the convolution and its scalability under parallelism) as well as dynamic properties of its execution (i.e., sparsity of the computation). Given this characterization, we present an automatic framework called spg-CNN for optimizing CNN training on CPUs. It comprises of a computation scheduler for efficient parallel execution, and two code generators: one that optimizes for sparsity, and the other that optimizes for spatial reuse in convolutions. We evaluate spg-CNN using convolutions from a variety of real world benchmarks, and show that spg-CNN can train CNNs faster than state-of-the-art approaches by an order of magnitude.
机译:卷积神经网络(CNN)是一类人工神经网络(ANN),在模式识别任务中非常高效,而模式识别任务是各种领域中难以解决的AI问题的基础,例如语音识别,对象识别和自然语言处理。但是,CNN的计算量很大。本文介绍了在CPU上训练CNN的性能优化机会的第一个特征。我们的表征包括基于网络本身结构的见解(即卷积的内在算术强度及其在并行性下的可伸缩性)以及其执行的动态特性(即计算的稀疏性)。鉴于此特征,我们提出了一个称为spg-CNN的自动框架,用于优化CPU上的CNN训练。它由一个用于高效并行执行的计算调度程序和两个代码生成器组成:一个针对稀疏性进行优化,而另一个针对卷积中的空间复用进行优化。我们使用来自各种现实世界基准的卷积评估了spg-CNN,并显示spg-CNN可以比最先进的方法快一个数量级地训练CNN。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号