首页> 外文会议>IEEE International Symposium on Performance Analysis of Systems and Software >SeqPoint: Identifying Representative Iterations of Sequence-Based Neural Networks
【24h】

SeqPoint: Identifying Representative Iterations of Sequence-Based Neural Networks

机译:SeqPoint:识别基于序列的神经网络的代表性迭代

获取原文

摘要

The ubiquity of deep neural networks (DNNs) continues to rise, making them a crucial application class for hardware optimizations. However, detailed profiling and characterization of DNN training remains difficult as these applications often run for hours to days on real hardware. Prior works have exploited the iterative nature of DNNs to profile a few training iterations to represent the entire training run. While such a strategy is sound for networks like convolutional neural networks (CNNs), where the nature of the computation is largely input independent, we observe in this work that this approach is sub-optimal for sequence-based neural networks (SQNNs) such as recurrent neural networks (RNNs). The amount and nature of computations in SQNNs can vary for each input, resulting in heterogeneity across iterations. Thus, arbitrarily selecting a few iterations is insufficient to accurately summarize the behavior of the entire training run. To tackle this challenge, we carefully study the factors that impact SQNN training iterations and identify input sequence length as the key determining factor for variations across iterations. We then use this observation to characterize all iterations of an SQNN training run (requiring no profiling or simulation of the application) and select representative iterations, which we term SeqPoints. We analyze two state-of-the-art SQNNs, DeepSpeech2 and Google's Neural Machine Translation (GNMT), and show that SeqPoints can represent their entire training runs accurately, resulting in geomean errors of only 0.11% and 0.53%, respectively, when projecting overall runtime and 0.13% and 1.50% when projecting speedups due to architectural changes. This high accuracy is achieved while reducing the time needed for profiling by 345x and 214x for the two networks compared to full training runs. As a result, SeqPoint can enable analysis of SQNN training runs in mere minutes instead of hours or days.
机译:深度神经网络(DNN)的无处不在继续上升,使它们成为硬件优化的重要应用类。然而,随着这些应用程序经常在实际硬件上运行几小时数小时,DNN培训的详细分析和表征仍然很困难。先前的作品利用DNN的迭代性质来配置一些培训迭代来代表整个训练运行。虽然这样的策略是卷积神经网络(CNNS)等网络的声音,但是,计算的性质在很大程度上输入独立,我们在这项工作中观察到这种方法是基于序列的神经网络(SQNN)的次优,诸如经常性的神经网络(RNN)。 SQNNS中计算的量和性质可以因每个输入而变化,导致迭代的异质性。因此,任意选择几个迭代不足以准确地总结整个训练运行的行为。为了解决这一挑战,我们仔细研究了影响SQNN训练迭代的因素,并将输入序列长度识别为迭代迭代变化的关键确定因素。然后,我们使用此观察来表征SQNN培训运行的所有迭代(不需要应用程序的分析或模拟),并选择我们术语SEQPoints的代表迭代。我们分析了两个最先进的SQNNS,DeepSpeech2和Google的神经机翻译(GNMT),并显示SEQPoints可以代表其整个培训准确运行,导致突出时分别仅为0.11%和0.53%的地质误差由于架构变化,在投影加速时,整体运行时和0.13%和1.50%。与完全训练运行相比,实现了这种高精度,同时减少了两个网络的分析所需的时间345倍和214倍。因此,SEQPOINT可以在仅仅分钟而不是小时或几天的情况下实现SQNN培训的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号