首页> 外文会议>Workshop on Computational Approaches to Code Switching >Semi-supervised Acoustic and Language Model Training for English-isiZulu Code-Switched Speech Recognition
【24h】

Semi-supervised Acoustic and Language Model Training for English-isiZulu Code-Switched Speech Recognition

机译:英文-伊祖鲁语码转换语音识别的半监督声学和语言模型训练

获取原文

摘要

We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4%, and a further 2.2% after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.
机译:我们介绍了使用肥皂剧语音对英语-isiZulu代码转换的ASR进行半监督的声学和语言模型训练的分析。使用四个以英语-isiZulu,英语-isiXhosa,英语-Setswana和英语-Sesotho运行的双语代码转换转录系统,自动转录了大约11个小时的未转录多语言语音。这些转录被并入声学和语言模型训练集中。结果表明,TDNN-F声学模型受益于附加的半监督数据,并且通过包含附加的CNN层,甚至可以实现更好的性能。使用这些CNN-TDNN-F声学模型,半监督训练的第一次迭代实现了绝对混合语言WER降低3.4%,第二次迭代后进一步降低了2.2%。尽管未转录数据中的语言是未知的,但是当所有自动转录的数据都用于训练而不仅仅是分类为English-isiZulu的语音时,可以获得最佳结果。尽管减少了困惑,但半监督语言模型仍无法提高ASR性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号