首页> 外文会议>International Conference on Speech and Computer >Reducing the Inter-speaker Variance of CNN Acoustic Models Using Unsupervised Adversarial Multi-task Training
【24h】

Reducing the Inter-speaker Variance of CNN Acoustic Models Using Unsupervised Adversarial Multi-task Training

机译:使用无人监督对抗多任务训练减少CNN声学模型的说话者间差异

获取原文

摘要

Although the Deep Neural Network (DNN) technology has brought significant improvements in automatic speech recognition, the technology is still vulnerable to changing environmental conditions. The adversarial multi-task training method was recently proposed to increase the domain and noise robustness of DNN acoustic models. Here, we apply this method to reduce the inter-speaker variance of a convolutional neural network-based speech recognition system. One drawback of the baseline method is that it requires speaker labels for the training dataset. Hence, we propose two modifications which allow the application of the method in the unsupervised scenarios; that is, when speaker annotation is not available. Our approach applies unsupervised speaker clustering, which is based on a standard feature set in the first case, while in the second case we modify the network structure to perform speaker discrimination in the manner of a Siamese DNN. In the supervised scenario we report a relative error rate reduction of 4%. The two unsupervised approaches achieve smaller, but consistent improvements of about 3% on average.
机译:尽管深度神经网络(DNN)技术在自动语音识别方面取得了显着改进,但该技术仍然容易受到环境条件变化的影响。最近提出了对抗性多任务训练方法,以增加DNN声学模型的域和噪声鲁棒性。在这里,我们应用这种方法来减少基于卷积神经网络的语音识别系统的说话人之间的差异。基线方法的一个缺点是,它需要训练数据集的说话者标签。因此,我们提出了两种修改方法,允许在无人监督的情况下应用该方法。也就是说,当发言人注释不可用时。我们的方法应用无监督的说话人聚类,这是在第一种情况下基于标准功能集,而在第二种情况下,我们修改了网络结构以以连体DNN的方式执行说话人辨别。在有监督的情况下,我们报告相对错误率降低了4%。两种无监督方法可实现较小的,但始终如一的改进,平均提高约3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号