Advances in subword-based HMM-DNN speech recognition across languages

Peter Smit; Sami Virpioja; Mikko Kurimo

首页> 外文期刊>Computer speech and language >Advances in subword-based HMM-DNN speech recognition across languages

【24h】

Advances in subword-based HMM-DNN speech recognition across languages

机译：跨语言的基于次字的HMM-DNN语音识别的进步

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve the results further. For all the four datasets we obtain the best results published so far. Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved by 4% using graphemes only when several grapheme-based models are combined. Furthermore, combining both grapheme and phoneme models yields the state-of-the-art error rate of 15.9% for the MGB 2018 dev17b test. For all four languages we also show that the language models perform reasonably well when only limited training data is available.

机译：我们描述了一种基于加权有限状态传感器，隐马尔可夫模型和深神经网络的语音识别系统中的小型语言模型的新方法。声学模型是以格式化的方式构建，即不需要发音词典，它们可以与任何类型的子字语言模型一起使用，包括字符模型。短语单位的优点是良好的词汇覆盖，减少了数据稀疏性，避免了适应性的词汇错配。此外，构建神经网络语言模型（NNLMS）更实用，因为输入和输出层很小。我们还提出了通过重建和组合识别格子来组合不同类型语言模型单元的益处的方法。我们对四种语言的语音数据集进行了广泛的评估：芬兰语，瑞典语，阿拉伯语和英语。结果表明，与传统的N-GRAM语言模型相比，短语的好处更加符合NNLM。组合不同的声学模型和具有各种单位的语言模型，进一步提高结果。对于所有四个数据集，我们获得到目前为止发布的最佳结果。我们的方法即使是英语也才能表现良好，其中基于音素的声学模型和基于Word的语言模型通常是占主导地位的：只有在组合几种基于Rapareme的模型时，才能使用图形达到和提高基于音素的基线性能。此外，将图形和音素模型组合产生了MGB 2018 Dev17B测试的最先进的误差率为15.9％。对于所有四种语言，我们还表明语言模型在只有有限的培训数据时效果很好。

著录项

来源
《Computer speech and language》 |2021年第3期|101158.1-101158.18|共18页
作者
Peter Smit; Sami Virpioja; Mikko Kurimo;
展开▼
作者单位

Department of Signal Processing and Acoustics Aalto University Finland Inscripta Finland;

Department of Digital Humanities University of Helsinki Finland Department of Signal Processing and Acoustics Aalto University Finland Utopia Analytics Finland;

Department of Signal Processing and Acoustics Aalto University Finland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Large vocabulary speech recognition; Subword units; Character units; Recurrent neural network language models;

机译：大词汇表情;子字单元;字符单位;经常性的神经网络语言模型;

相似文献

外文文献
中文文献
专利

1. Enhancing Comprehension of Lecture Content in a Foreign Language as the Medium of Instruction: Comparing Speech-to-Text Recognition With Speech-Enabled Language Translation [J] . Rustam Shadiev, Yu-Cheng Chien, Yueh-Min Huang SAGE Open . 2020,第3期

机译：以外语为讲座内容的理解为教学媒介：将语音到文本识别与启用语音的语言翻译进行比较
2. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages [J] . Cini kurian International Journal of Advanced Networking and Applications . 2015,第7018期

机译：语音语料库在印度语言中自动语音识别的发展述评
3. A Sample Work Developed Using Speech Recognition Technology about basic Language Education: Speak and Learn – Learn with Speech Recognition [J] . Nursel Yal?in, Yal??n Altun Procedia - Social and Behavioral Sciences . 2013,第2期

机译：使用语音识别技术开发的有关基本语言教育的样本作品：口语和学习–通过语音识别学习
4. Improving the Usage of Subword-Based Units for Turkish Speech Recognition [C] . Gözde Çetinkaya, Ebru Arısoy, Murat Saraçlar Signal Processing and Communications Applications Conference . 2020

机译：改善土耳其语音识别基于次字的单位的用法
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. Age-Related Changes in Speech Recognition Performance in Spanish–English Bilinguals First and Second Languages [O] . Jamie L. Desjardins, Elisa G. Barraza, Jordan A. Orozco -1

机译：西班牙语-英语双语者的第一语言和第二语言的语音识别性能中与年龄相关的变化
7. Advances in subword-based HMM-DNN speech recognition across languages [O] . Peter Smit, Sami Virpioja, Mikko Kurimo 2021

机译：跨语言的基于次字的HMM-DNN语音识别的进步
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Advances in subword-based HMM-DNN speech recognition across languages

摘要

著录项

相似文献

相关主题

期刊订阅