Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition

机译：基于音节的声学建模与无格式MMI用于普通话语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most automatic speech recognition (ASR) systems in past decades have used context-dependent (CD) phones as the fundamental acoustic units. However, these phone-based approaches lack an easy and efficient way for modeling long-term temporal dependencies. Compared with phone units, syllables span a longer time, typically several phones, thereby having more stable acoustic realizations. In this work, we aim to train a syllable-based acoustic model for Mandarin ASR with lattice-free maximum mutual information (LF-MMI) criterion. We expect that, the combination of longer linguistic units, the RNN-based model structure and the sequence-level objective function, can result in better modeling of long-term temporal acoustic variations. We make multiple modifications to improve the performance of syllable-based AM and benchmark our models on two large-scale databases. Experimental results show that the proposed syllable-based AM performs much better than the CD phone-based baseline, especially on noisy test sets, with faster decoding speed.

机译：过去几十年中的大多数自动语音识别（ASR）系统使用上下文相关（CD）手机作为基本声学单元。然而，这些基于电话的方法缺乏用于建模长期时间依赖性的简单有效的方法。与电话单元相比，音节跨越了更长的时间，通常是几个电话，从而具有更稳定的声学实现。在这项工作中，我们的目标是使用无格式的最大互信息（LF-MMI）标准来培训一个基于音节的声学模型。我们预期，长期语言单位的组合，基于RNN的模型结构和序列级目标函数，可以导致更好的长期时间声学变化建模。我们多次修改以提高基于音节的am和在两个大规模数据库上的模型的性能。实验结果表明，所提出的基于音节的AM比基于CD电话的基线更好，特别是在嘈杂的测试集上，具有更快的解码速度。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2021年|1-5|共5页
会议地点
作者
Jie Li; Zhiyun Fan; Xiaorui Wang; Yan Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Linguistics; Linear programming; Acoustics; Decoding; Topology; Noise measurement;

机译：隐藏的马尔可夫模型;语言学;线性规划;声学;解码;拓扑;噪声测量;

相似文献

外文文献
中文文献
专利

1. Improved syllable-based continuous Mandarin speech recognition using intersyllable boundary models [J] . Saga Chang, Sin-Horng Chen Electronics Letters . 1995,第11期

机译：使用音节间边界模型改进基于音节的连续普通话语音识别
2. Domain adaptation of lattice-free MMI based TDNN models for speech recognition [J] . Yanhua Long, Yijie Li, Hone Ye, International journal of speech technology . 2017,第1期

机译：基于无格MMI的TDNN模型的语音识别域自适应
3. Development of a Mandarin-English Bilingual Speech Recognition System with Unified Acoustic Models [J] . Qing-Qing Zhang, Jie-Lin Pan, Yong-Hong Yan Journal of information science and engineering . 2010,第4期

机译：统一声学模型的中英文双语语音识别系统的开发
4. Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition [C] . Yuanyuan Zhao, Linhao Dong, Shuang Xu, International Joint Conference on Neural Networks . 2018

机译：基于音节的CTC声学建模，用于多场景普通话语音识别
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models [O] . A. Paats, T. Alumäe, E. Meister, 2018

机译：一项爱沙尼亚放射线语音识别系统临床表现的回顾性分析：不同声学和语言模型的影响
7. Dysarthric Speech Recognition with Lattice-Free MMI [O] . Enno Hermann, Mathew Magimai.-Doss 2020

机译：与无格式MMI的疑似语音识别

Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅