Research on Mongolian Speech Recognition Based on FSMN

机译：基于FSMN的蒙古语语音识别研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.

机译：在蒙古语语音识别任务上，深层神经网络（DNN）模型已经取得了显著成果，但是与中文，英文或其他语言相比，仍然存在进一步增强的机会。本文介绍了前馈顺序存储网络（FSMN）在蒙古语语音识别任务中的首次应用，该模型可在不使用递归反馈的情况下对时间序列中的长期依存关系进行建模。此外，通过在特征空间中对说话人建模，我们提取i-vector特征并将其与Fbank特征结合起来作为输入，以验证其在蒙古ASR任务中的有效性。最后，首先分别通过使用最大互信息（MMI）和州级最小贝叶斯风险（sMBR）对FSMN进行判别训练。实验结果表明：在蒙古语ASR中，FSMN的性能优于DNN，通过将i-vector特征与Fbank特征相结合作为FSMN输入和判别训练，与之相比，误码率（WER）相对降低了17.9％。 DNN基准。

著录项

来源
《Natural language understanding and intelligent applications》|2017年|243-254|共12页
会议地点 Dalian(CN)
作者
Yonghe Wang; Feilong Bao; Hongwei Zhang; Guanglai Gao;
展开▼
作者单位

College of Computer Science, Inner Mongolia University, Huhhot 010021, China;

College of Computer Science, Inner Mongolia University, Huhhot 010021, China;

College of Computer Science, Inner Mongolia University, Huhhot 010021, China;

College of Computer Science, Inner Mongolia University, Huhhot 010021, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Mongolian; Speech recognition; DNN; FSMN; i-vector Sequence-criterion training;

机译：蒙;语音识别; DNN； FSMN; i-vector序列准则训练;

相似文献

外文文献
中文文献
专利

1. Combination of GMM-Based Speech Estimation Method and Temporal Domain SVD-Based Speech Enhancement for Noise Robust Speech Recognition [J] . Masakiyo Fujimoto, Yasuo Ariki Systems and Computers in Japan . 2007,第3期

机译：基于GMM的语音估计方法与基于时域SVD的语音增强相结合的噪声鲁棒语音识别
2. English Phrase Speech Recognition Based on Continuous Speech Recognition Algorithm and Word Tree Constraints [J] . Haifan Du, Haiwen Duan Complexity . 2021,第a期

机译：英语短语语音识别基于连续语音识别算法和字树约束
3. Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds [J] . Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Computer speech and language . 2013,第3期

机译：客厅中的语音识别：基于声音的空间，频谱和时间建模的集成语音增强和识别系统
4. Research on Mongolian Speech Recognition Based on FSMN [C] . Yonghe Wang, Feilong Bao, Hongwei Zhang, International Conference on Natural Language Processing and Chinese Computing . 2017

机译：基于FSMN的蒙古语音识别研究
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech [O] . László Tóth, Ildikó Hoffmann, Gábor Gosztolya, -1

机译：基于语音识别的自发性语音自动检测轻度认知障碍的解决方案
7. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API [O] . Hyun Jae Yoo, Sungwoong Seo, Sun Woo Im, 2021

机译：基于云的语音识别开放API韩语语音规则的连续语音识别性能评估
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Research on Mongolian Speech Recognition Based on FSMN

摘要

著录项

相似文献

相关主题

期刊订阅