首页> 外文期刊>Communications, China >Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus
【24h】

Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus

机译:大型训练语料库上使用连接器时间分类的基于长期短期记忆递归神经网络的声学模型

获取原文
获取原文并翻译 | 示例
           

摘要

A Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model (GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model (HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate (WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.
机译:长短期记忆(LSTM)递归神经网络(RNN)推动了基于高斯混合模型(GMM)的声学模型的巨大改进。但是,这些基于混合方法的模型需要从基于GMM的声学模型获得的强制对齐的隐马尔可夫模型(HMM)状态序列。因此,训练基于GMM的声学模型和基于深度学习的声学模型都需要较长的计算时间。为了解决这个问题,提出了一种使用CTC算法的声学模型。 CTC算法不需要基于GMM的声学模型,因为它不使用强制对齐的HMM状态序列。但是,以前使用CTC进行的基于LSTM RNN的声学模型的研究使用的是小型训练语料库。在本文中,使用CTC的基于LSTM RNN的声学模型在大规模的训练语料库上进行了训练,并对其性能进行了评估。对于干净的语音和嘈杂的语音,所实现的声学模型在单词错误率(WER)方面的性能分别为6.18%和15.01%。这类似于基于混合方法的声学模型的性能。

著录项

  • 来源
    《Communications, China》 |2017年第9期|23-31|共9页
  • 作者单位

    Department of Computer Science and Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea;

    Department of Computer Science and Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea;

    Department of Computer Science and Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea;

    Department of Computer Science and Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea;

    Department of Information and Communication Engineering, Yeungnam University 280 Daehak-Ro, Gyeongsan, Gyeongbuk, 38541, Republic of Korea;

    School of Electronics Engineering, Kyungpook National University 80 Daehakro, Bukgu, Daegu, 41566, Republic of Korea;

    Department of Computer Science and Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Hidden Markov models; Logic gates; Acoustics; Training; Mathematical model; Computer architecture; Speech;

    机译:隐马尔可夫模型;逻辑门;声学;训练;数学模型;计算机体系结构;语音;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号