...
首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Approximate Computing for Long Short Term Memory (LSTM) Neural Networks
【24h】

Approximate Computing for Long Short Term Memory (LSTM) Neural Networks

机译:长短期记忆(LSTM)神经网络的近似计算

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Long Short Term Memory (LSTM) networks are a class of recurrent neural networks that are widely used for machine learning tasks involving sequences, including machine translation, text generation, and speech recognition. Large-scale LSTMs, which are deployed in many real-world applications, are highly compute intensive. To address this challenge, we propose AxLSTM, an application of approximate computing to improve the execution efficiency of LSTMs. An LSTM is composed of cells, each of which contains a cell state along with multiple gating units that control the addition and removal of information from the state. The LSTM execution proceeds in timesteps, with a new symbol of the input sequence processed at each timestep. AxLSTM consists of two techniques-Dynamic Timestep Skipping (DTS) and Dynamic State Reduction (DSR). DTS identifies, at runtime, input symbols that are likely to have little or no impact on the cell state and skips evaluating the corresponding timesteps. In contrast, DSR reduces the size of the cell state in accordance with the complexity of the input sequence, leading to a reduced number of computations per timestep. We describe how AxLSTM can be applied to the most common application of LSTMs, viz., sequence-to-sequence learning. We implement AxLSTM within the TensorFlow deep learning framework and evaluate it on 3 state-of-the-art sequence-to-sequence models. On a 2.7 GHz Intel Xeon server with 128 GB memory and 32 processor cores, AxLSTM achieves 1.08 × -1.31× speedups with minimal loss in quality, and 1.12 × -1.37× speedups when moderate reductions in quality are acceptable.
机译:长短期记忆(LSTM)网络是一类递归神经网络,广泛用于涉及序列的机器学习任务,包括机器翻译,文本生成和语音识别。部署在许多实际应用中的大规模LSTM占用大量计算资源。为了应对这一挑战,我们提出了AxLSTM,这是一种近似计算的应用,可以提高LSTM的执行效率。 LSTM由单元组成,每个单元包含一个单元状态以及多个门控单元,这些门控单元控制状态信息的添加和删除。 LSTM执行按时间步进行,并在每个时间步处处理输入序列的新符号。 AxLSTM包含两种技术-动态时间步跳过(DTS)和动态状态缩减(DSR)。 DTS在运行时识别可能对单元状态几乎没有影响或没有影响的输入符号,并跳过评估相应的时间步长的过程。相反,DSR根据输入序列的复杂性减小了单元状态的大小,从而导致每个时间步的计算数量减少。我们描述了如何将AxLSTM应用于LSTM的最常见应用,即序列到序列学习。我们在TensorFlow深度学习框架中实现AxLSTM,并在3个最新的序列到序列模型中对其进行评估。在具有128 GB内存和32个处理器内核的2.7 GHz Intel Xeon服务器上,AxLSTM可以实现1.08×-1.31x的加速,而质量损失最小;如果可接受的质量下降幅度适中,则可以实现1.12×-1.37x的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号