首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement
【24h】

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement

机译:基于深神经网络的语音增强的动态数据生成一种新的培训策略

获取原文

摘要

In this paper, a new training strategy is proposed to address the key issue in deep neural network (DNN) based speech enhancement: how to effectively utilize the limited data with a growing awareness of the necessity to increase training data in the deep learning era. Traditionally, a fixed training set consisting of a large amount of paired utterances, i.e., clean speech and corresponding noisy speech, must be prepared in advance. However, it seems inevitable to enlarge noisy speech in the training stage for making model adaptive to various noise environments. Besides, involving more training data leads to longer training time as the fixed training set should be repeated for multiple epochs. In this study, we propose a novel training method via dynamic data generation. The key idea is the synthetic phase of noisy speech data is conducted on the fly from utterance level to the batch level. Three advantages are gained from this new training method. First, by dynamic generation of training data batch, it is not necessary to prepare and store the fixed training set as in the conventional training method. Second, with the same training time as in the conventional method, more abundant noisy data are actually fed into DNN model. Finally, different evaluation measures, including PESQ, STOI, LSD, and SegSNR, can be consistently improved for the unseen noise types, demonstrating the better generalization capability of the proposed training strategy.
机译:在本文中,提出了一种新的培训策略来解决基于深度神经网络(DNN)的语音增强的关键问题:如何有效利用有限的数据,以越来越意识到增加深度学习时代的培训数据。传统上,必须提前准备由大量配对话语组成的固定训练集,即清洁语音和相应的嘈杂演讲。然而,在培训阶段扩大嘈杂的讲话似乎是不可避免的,以使模型适应各种噪声环境。此外,涉及更多训练数据导致更长的培训时间,因为应该为多个时期重复固定训练集。在本研究中,我们通过动态数据生成提出了一种新颖的训练方法。关键思想是噪声语音数据的合成阶段,从发声水平到批量进行。这种新的训练方法获得了三种优点。首先,通过动态生成训练数据批次,没有必要在传统训练方法中准备和存储固定训练集。其次,具有与传统方法中相同的训练时间,将更丰富的噪声数据实际馈入DNN模型。最后,可以始终如一地改善不同的评估措施,包括PESQ,STOI,LSD和SEGSNR,以证明所提出的培训策略的更好的概括能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号