首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models
【24h】

Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models

机译:端到端声学模型的基于最大后验的解码

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a novel decoding framework for acoustic models (AMs) based on end-to-end neural networks (e.g., connectionist temporal classification). The end-to-end training of AMs has recently demonstrated high accuracy and efficiency in automatic speech recognition (ASR). When using the trained AM in decoding, although a language model (LM) is implicitly involved in such an end-to-end AM, it is still essential to integrate an external LM trained with a large text corpus to achieve the best results. While there is no theoretical justification, most of the studies suggest using a naive interpolation of the end-to-end AM score and the external LM score, empirically. In this paper, we propose a more theoretically sound decoding framework derived from a maximization of the posterior probability of a word sequence given an observation. As a consequence of the theory, the subword LM is newly introduced to seamlessly integrate the external LM score with the end-to-end AM score. Our proposed method can be achieved by a small modification of the conventional weighted finite-state transducer-based implementation, without having to heavily increase the graph size. We tested the proposed decoding framework on ASR experiments with the Corpus of the Wall Street Journal and the Corpus of Spontaneous Japanese. The results showed that the proposed framework achieved significant and consistent improvements over the conventional interpolation-based decoding framework.
机译:本文提出了一种基于端到端神经网络(例如,连接主义的时间分类)的声学模型(AM)的新型解码框架。 AM的端到端训练最近证明了自动语音识别(ASR)的准确性和高效率。当使用训练有素的AM进行解码时,尽管语言模型(LM)隐含在这种端到端AM中,但仍必须集成训练有大型文本语料库的外部LM以获得最佳结果。尽管没有理论上的依据,但大多数研究建议凭经验对端到端AM得分和外部LM得分进行简单的内插。在本文中,我们提出了一个理论上更合理的解码框架,该框架从给定观察结果的单词序列的后验概率的最大化得出。作为该理论的结果,新引入了子词LM,以将外部LM得分与端到端AM得分无缝集成。我们提出的方法可以通过对基于加权的有限状态换能器的传统实现方式进行少量修改而实现,而不必大量增加图形的大小。我们在《华尔街日报》的语料库和《自发日语的语料库》的ASR实验中测试了建议的解码框架。结果表明,与传统的基于插值的解码框架相比,该框架取得了显着且一致的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号