首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks
【24h】

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

机译:基于深度神经网络的混响时间感知语音去混响方法

获取原文
获取原文并翻译 | 示例

摘要

A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min–max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global mean-variance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.
机译:提出了一种基于混响时间感知的深度神经网络(DNN)语音去混响框架,以处理各种混响时间。设计健壮的系统需要三个关键步骤。首先,与最新算法中的S形激活和最小-最大归一化相反,采用输出层的线性激活函数和目标特征的全局均值-归一化来从混响中学习复杂的非线性映射函数消声语音,并改善低频和中频内容的恢复。接下来,研究了两个关键设计参数,即语音成帧中的移码大小和DNN输入处的声学上下文窗口大小,以表明在DNN训练阶段需要依赖RT60的参数,以便在各种情况下优化系统性能。混响环境。最后,在将对数功率谱特征输入到经过训练的DNN进行语音去混响之前,估计混响时间以选择适当的移码和上下文窗口大小以进行特征提取。我们的实验结果表明,在不考虑混响时间的情况下,提出的框架性能优于传统DNN,即使在极弱和严重的混响条件下,其性能也仅比已知混响时间的预言情况稍差。它还可以很好地推广到看不见的房间大小,扬声器和麦克风的位置以及记录的房间脉冲响应。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号