首页> 外文期刊>Computer speech and language >Advances in subword-based HMM-DNN speech recognition across languages
【24h】

Advances in subword-based HMM-DNN speech recognition across languages

机译:跨语言的基于次字的HMM-DNN语音识别的进步

获取原文
获取原文并翻译 | 示例
           

摘要

We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve the results further. For all the four datasets we obtain the best results published so far. Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved by 4% using graphemes only when several grapheme-based models are combined. Furthermore, combining both grapheme and phoneme models yields the state-of-the-art error rate of 15.9% for the MGB 2018 dev17b test. For all four languages we also show that the language models perform reasonably well when only limited training data is available.
机译:我们描述了一种基于加权有限状态传感器,隐马尔可夫模型和深神经网络的语音识别系统中的小型语言模型的新方法。声学模型是以格式化的方式构建,即不需要发音词典,它们可以与任何类型的子字语言模型一起使用,包括字符模型。短语单位的优点是良好的词汇覆盖,减少了数据稀疏性,避免了适应性的词汇错配。此外,构建神经网络语言模型(NNLMS)更实用,因为输入和输出层很小。我们还提出了通过重建和组合识别格子来组合不同类型语言模型单元的益处的方法。我们对四种语言的语音数据集进行了广泛的评估:芬兰语,瑞典语,阿拉伯语和英语。结果表明,与传统的N-GRAM语言模型相比,短语的好处更加符合NNLM。组合不同的声学模型和具有各种单位的语言模型,进一步提高结果。对于所有四个数据集,我们获得到目前为止发布的最佳结果。我们的方法即使是英语也才能表现良好,其中基于音素的声学模型和基于Word的语言模型通常是占主导地位的:只有在组合几种基于Rapareme的模型时,才能使用图形达到和提高基于音素的基线性能。此外,将图形和音素模型组合产生了MGB 2018 Dev17B测试的最先进的误差率为15.9%。对于所有四种语言,我们还表明语言模型在只有有限的培训数据时效果很好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号