首页> 外文会议>Workshop on Automatic Speech Recognition and Understanding >SEMI-SUPERVISED TRAINING OF DEEP NEURAL NETWORKS
【24h】

SEMI-SUPERVISED TRAINING OF DEEP NEURAL NETWORKS

机译:深度神经网络的半监督培训

获取原文

摘要

In this paper we search for an optimal strategy for semi-supervised Deep Neural Network (DNN) training. We assume that a small part of the data is transcribed, while the majority of the data is untranscribed. We explore self-training strategies with data selection based on both the utterance-level and frame-level confidences. Further on, we study the interactions between semi-supervised frame-discriminative training and sequence-discriminative sMBR training. We found it beneficial to reduce the disproportion in amounts of transcribed and untranscribed data by including the transcribed data several times, as well as to do a frame-selection based on per-frame confidences derived from confusion in a lattice. For the experiments, we used the Limited language pack condition for the Surprise language task (Vietnamese) from the IARPA Babel program. The absolute Word Error Rate (WER) improvement for frame cross-entropy training is 2.2%, this corresponds to WER recovery of 36% when compared to the identical system, where the DNN is built on the fully transcribed data.
机译:在本文中,我们寻找半监督深神经网络(DNN)培训的最佳策略。我们假设数据的一小部分被转录,而大多数数据是未经筛选的。我们根据话语级和帧级信心探索具有数据选择的自我培训策略。此外,我们研究了半监督框架 - 鉴别培训和序列鉴别SMBR培训之间的相互作用。我们发现它通过包括多次转录数据的转录和未筛选数据的量减少了减少的歧化,以及根据晶格中的混淆的每帧自信来进行帧选择。对于实验,我们利用来自IARPA Babel计划的惊喜语言任务(越南语)的有限语言包条件。帧交叉熵培训的绝对字错误率(WER)改进为2.2%,与与相同的系统相比,这对应于WER恢复36%,其中DNN基于完全转录的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号