首页> 外文会议>Annual Conference of the International Speech Communication Association >Data augmentation using multi-input multi-output source separation for deep neural network based acoustic modeling
【24h】

Data augmentation using multi-input multi-output source separation for deep neural network based acoustic modeling

机译:基于深神经网络的声学建模的多输入多输出源分离的数据增强

获取原文

摘要

We investigate the use of local Gaussian modeling (LGM) based source separation to improve speech recognition accuracy. Previous studies have shown that the LGM based source separation technique has been successfully applied to the runtime speech enhancement and the speech enhancement of training data for deep neural network (DNN) based acoustic modeling. In this paper, we propose a data augmentation method utilizing the multi-input multi-output (MIMO) characteristic of LGM based source separation. We first investigate the difference between unprocessed multi-microphone signals and multi-channel output signals from LGM based source separation as augmented training data for DNN based acoustic modeling. Experimental results using the third CHiME challenge dataset show that the proposed data augmentation outperforms the conventional data augmentation. In addition, we experiment the beamforming applied to the source separated signals as runtime speech enhancement. The results show that the proposed runtime beamforming further improves the speech recognition accuracy.
机译:我们调查了基于本地高斯建模(LGM)的源分离来提高语音识别准确性的使用。以前的研究表明,基于LGM的源分离技术已经成功应用于运行时语音增强和基于深神经网络(DNN)的声学建模的训练数据的语音增强。在本文中,我们提出了一种利用基于LGM的源分离的多输入多输出(MIMO)特性的数据增强方法。我们首先研究了基于LGM的源分离的未处理多麦克风信号和来自LGM的源分离之间的多通道输出信号的差异,作为基于DNN的声学建模的增强训练数据。使用第三个Chime挑战数据集的实验结果表明,所提出的数据增强优于传统数据增强。此外,我们试验应用于源分离信号的波束成形作为运行时语音增强。结果表明,所提出的运行时间波束成形进一步提高了语音识别准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号