首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement
【24h】

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement

机译:一种基于动态数据生成的新型训练策略,用于基于深度神经网络的语音增强

获取原文

摘要

In this paper, a new training strategy is proposed to address the key issue in deep neural network (DNN) based speech enhancement: how to effectively utilize the limited data with a growing awareness of the necessity to increase training data in the deep learning era. Traditionally, a fixed training set consisting of a large amount of paired utterances, i.e., clean speech and corresponding noisy speech, must be prepared in advance. However, it seems inevitable to enlarge noisy speech in the training stage for making model adaptive to various noise environments. Besides, involving more training data leads to longer training time as the fixed training set should be repeated for multiple epochs. In this study, we propose a novel training method via dynamic data generation. The key idea is the synthetic phase of noisy speech data is conducted on the fly from utterance level to the batch level. Three advantages are gained from this new training method. First, by dynamic generation of training data batch, it is not necessary to prepare and store the fixed training set as in the conventional training method. Second, with the same training time as in the conventional method, more abundant noisy data are actually fed into DNN model. Finally, different evaluation measures, including PESQ, STOI, LSD, and SegSNR, can be consistently improved for the unseen noise types, demonstrating the better generalization capability of the proposed training strategy.
机译:本文提出了一种新的训练策略,以解决基于深度神经网络(DNN)的语音增强中的关键问题:如何有效利用有限的数据,并逐渐意识到在深度学习时代增加训练数据的必要性。传统上,必须预先准备由大量成对发声组成的固定训练集,即干净的语音和相应的嘈杂语音。但是,为了使模型适应各种噪声环境,在训练阶段扩大嘈杂的语音似乎是不可避免的。此外,涉及更多的训练数据会导致更长的训练时间,因为应该针对多个时期重复固定的训练集。在这项研究中,我们提出了一种通过动态数据生成的新颖训练方法。关键思想是有声语音数据的合成阶段是从发声级到批处理级的。这种新的训练方法具有三个优点。首先,通过动态生成训练数据批,无需像常规训练方法那样准备和存储固定训练集。其次,在与传统方法相同的训练时间下,实际上会将更丰富的噪声数据输入到DNN模型中。最后,针对看不见的噪声类型,可以持续改善包括PESQ,STOI,LSD和SegSNR在内的各种评估措施,从而证明了所提出的训练策略具有更好的泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号