【24h】

LSTM: A Search Space Odyssey

机译:LSTM:搜索太空漫游

获取原文
获取原文并翻译 | 示例
           

摘要

Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs (≈15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
机译:自1995年成立以来,已针对循环神经网络提出了长期短期记忆(LSTM)架构的几种变体。近年来,这些网络已成为解决各种机器学习问题的最先进模型。这引起了人们对重新了解典型LSTM变体的各种计算组件的作用和实用性的兴趣。在本文中,我们针对三种代表性任务,对八个LSTM变体进行了首次大规模分析:语音识别,手写识别和和弦音乐建模。使用随机搜索分别优化了每个任务的所有LSTM变体的超参数,并使用功能强大的VAriance VAriance框架评估了它们的重要性。总的来说,我们总结了5400次实验运行的结果(约15年的CPU时间),这使我们的研究成为LSTM网络上同类研究中规模最大的一次。我们的结果表明,这些变体都不能显着改善标准LSTM体系结构,并且证明“遗忘门”和输出激活功能是其最关键的组成部分。我们进一步观察到,所研究的超参数实际上是独立的,并为其有效调整导出了指导原则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号