首页> 外文会议>Natural language understanding and intelligent applications >Research on Mongolian Speech Recognition Based on FSMN
【24h】

Research on Mongolian Speech Recognition Based on FSMN

机译:基于FSMN的蒙古语语音识别研究

获取原文
获取原文并翻译 | 示例

摘要

Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.
机译:在蒙古语语音识别任务上,深层神经网络(DNN)模型已经取得了显著成果,但是与中文,英文或其他语言相比,仍然存在进一步增强的机会。本文介绍了前馈顺序存储网络(FSMN)在蒙古语语音识别任务中的首次应用,该模型可在不使用递归反馈的情况下对时间序列中的长期依存关系进行建模。此外,通过在特征空间中对说话人建模,我们提取i-vector特征并将其与Fbank特征结合起来作为输入,以验证其在蒙古ASR任务中的有效性。最后,首先分别通过使用最大互信息(MMI)和州级最小贝叶斯风险(sMBR)对FSMN进行判别训练。实验结果表明:在蒙古语ASR中,FSMN的性能优于DNN,通过将i-vector特征与Fbank特征相结合作为FSMN输入和判别训练,与之相比,误码率(WER)相对降低了17.9%。 DNN基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号