首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Semi-supervised Training for End-to-end Models via Weak Distillation
【24h】

Semi-supervised Training for End-to-end Models via Weak Distillation

机译:通过弱蒸馏对端到端模型进行半监督训练

获取原文
获取外文期刊封面目录资料

摘要

End-to-end (E2E) models are a promising research direction in speech recognition, as the single all-neural E2E system offers a much simpler and more compact solution compared to a conventional model, which has a separate acoustic (AM), pronunciation (PM) and language model (LM). However, it has been noted that E2E models perform poorly on tail words and proper nouns, likely because the end-to-end optimization requires joint audio-text pairs, and does not take advantage of additional lexicons and large amounts of text-only data used to train the LMs in conventional models. There has been numerous efforts in training an RNN-LM on text-only data and fusing it into the end-to-end model. In this work, we contrast this approach to training the E2E model with audio-text pairs generated from unsupervised speech data. To target the proper noun issue specifically, we adopt a Part-of-Speech (POS) tagger to filter the unsupervised data to use only those with proper nouns. We show that training with filtered unsupervised-data provides up to a 13% relative reduction in word-error-rate (WER), and when used in conjunction with a cold-fusion RNN-LM, up to a 17% relative improvement.
机译:端到端(E2E)模型是语音识别的有前途的研究方向,因为与具有分离的声学(AM),语音发音的传统模型相比,单一的全神经E2E系统提供了更为简单,紧凑的解决方案(PM)和语言模型(LM)。但是,已经注意到,E2E模型在尾部单词和专有名词上的性能较差,这可能是因为端到端优化需要联合音频-文本对,并且没有利用额外的词典和大量纯文本数据用于训练传统模型中的LM。在针对纯文本数据进行RNN-LM训练并将其融合到端到端模型中,已经进行了许多工作。在这项工作中,我们将这种方法与通过无监督语音数据生成的音频-文本对训练E2E模型进行对比。为了专门针对专有名词问题,我们采用词性(POS)标记器对非监督数据进行过滤,以仅使用具有专有名词的数据。我们显示,使用过滤后的无监督数据进行训练可以使字错误率(WER)降低13%,并且与冷融合RNN-LM结合使用时,可以相对提高17%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号