首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
【24h】

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

机译:基于奇异值分解的混合NN / HMM模型语音识别的说话人自适应

获取原文
获取原文并翻译 | 示例
           

摘要

Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.
机译:最近,在许多大型词汇连续语音识别(LVCSR)任务中,已针对深度神经网络(DNN)提出了几种说话人自适应方法。但是,只有少数几种方法直接依靠调整训练后的DNN中的连接权重来优化系统性能,因为它非常容易过度拟合,尤其是在适应数据中缺少某些类别标签时。本文提出了一种基于奇异值分解(SVD)的混合NN / HMM语音识别模型说话人自适应新方法。我们将SVD应用于经过训练的DNN中的权重矩阵,然后使用自适应数据调整矩形对角矩阵。通过仅修改奇异值来稍微更新权重矩阵,从而缓解了过度拟合的问题。我们在两种标准语音识别任务(即TIMIT电话识别和Switchboard任务中的大词汇量语音识别)中评估了提出的自适应方法。实验结果表明,仅使用少量适应数据来适应大型DNN模型是有效的。例如,总机任务中的识别结果表明,提出的基于SVD的自适应方法可以使用每个说话者仅使用几十个自适应话语就可以实现高达3-6%的相对误差减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号