首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems
【24h】

Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems

机译:基于通用和专用深度神经网络的语音增强系统的语音清晰度潜力

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker, and the signal-to-noise ratio (SNR). Furthermore, we investigate how specialized DNN-based SE systems, which have been trained to be either noise type specific, speaker specific or SNR specific, perform relative to DNN based SE systems that have been trained to be noise type general, speaker general, and SNR general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained specifically to handle certain speakers, noise types and SNRs, are capable of achieving large improvements in estimated speech quality (SQ) and speech intelligibility (SI), when tested in matched conditions. Furthermore, we show that improvements in estimated SQ and SI can be achieved by a DNN-based SE system when exposed to unseen speakers, genders and noise types, given a large number of speakers and noise types have been used in the training of the system. In addition, we show that a DNN-based SE system that has been trained using a large number of speakers and a wide range of noise types outperforms a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI for others.
机译:在本文中,我们研究了基于深度神经网络(DNN)的单麦克风语音增强(SE)的各个方面。具体来说,我们针对背景噪声类型,目标说话者的性别以及信噪比(SNR),探索了基于DNN的最新SE系统的通用性。此外,我们研究了经过专门训练的噪声类型特定,说话者特定或SNR特定的基于DNN的SE系统相对于经过训练的噪声类型一般,扬声器一般和基于DNN的SE系统的性能。 SNR一般。最后,我们比较相对于基于最新技术的短时频谱幅度最小均方误差(STSA-MMSE)的,被训练为噪声类型通用,说话者通用和SNR通用的基于DNN的SE系统的性能SE算法。我们显示,基于DNN的SE系统经过专门训练以处理某些扬声器,噪声类型和SNR时,在匹配条件下进行测试时,能够在估计语音质量(SQ)和语音清晰度(SI)方面实现较大的提高。此外,我们表明,在暴露于看不见的说话者,性别和噪音类型的情况下,基于DNN的SE系统可以提高估计的SQ和SI,因为在系统的训练中已经使用了大量的说话者和噪音类型。此外,我们展示了使用大量扬声器和多种噪声类型训练过的基于DNN的SE系统,当在一定范围内进行测试时,其性能优于基于STSA-MMSE的最新SE方法看不见的扬声器和噪音类型。最后,使用在不可见的扬声器条件下测试的几种基于DNN的SE系统进行的聆听测试表明,这些系统可以提高某些SNR和噪声类型配置的SI,但降低其他SI的SI。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号