【24h】

Research on Mongolian Speech Recognition Based on FSMN

机译:基于FSMN的蒙古语音识别研究

获取原文

摘要

Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.
机译:深度神经网络(DNN)模型已经实现了蒙古语音识别任务的显着结果,然而,与中国英语或其他人相比,仍有进一步增强的机会。本文介绍了前源顺序存储网络(FSMN)对蒙古语音识别任务来模拟时间序列的长期依赖性而不使用反复反馈。此外,通过在特征空间中建模扬声器,我们提取I形式的功能,并将它们与FBANK功能组合为输入以验证蒙古ASR任务中的效力。最后,首先通过使用最大互信息(MMI)和状态级最小贝叶斯风险(SMBR)来在FSMN上进行鉴别培训。实验结果表明:FSMN在蒙古ASR中具有比DNN更好的性能,并且通过使用I - 载体特征与FBANK特征相结合,作为FSMN输入和鉴别训练,与...相比,错误率(WER)相对减少17.9% DNN基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号