A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

Bo Wu; Kehuang Li; Minglei Yang; Chin-Hui Lee

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

【24h】

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

机译：基于深度神经网络的混响时间感知语音去混响方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min–max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global mean-variance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.

机译：提出了一种基于混响时间感知的深度神经网络（DNN）语音去混响框架，以处理各种混响时间。设计健壮的系统需要三个关键步骤。首先，与最新算法中的S形激活和最小-最大归一化相反，采用输出层的线性激活函数和目标特征的全局均值-归一化来从混响中学习复杂的非线性映射函数消声语音，并改善低频和中频内容的恢复。接下来，研究了两个关键设计参数，即语音成帧中的移码大小和DNN输入处的声学上下文窗口大小，以表明在DNN训练阶段需要依赖RT60的参数，以便在各种情况下优化系统性能。混响环境。最后，在将对数功率谱特征输入到经过训练的DNN进行语音去混响之前，估计混响时间以选择适当的移码和上下文窗口大小以进行特征提取。我们的实验结果表明，在不考虑混响时间的情况下，提出的框架性能优于传统DNN，即使在极弱和严重的混响条件下，其性能也仅比已知混响时间的预言情况稍差。它还可以很好地推广到看不见的房间大小，扬声器和麦克风的位置以及记录的房间脉冲响应。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第1期|98-107|共10页
作者
Bo Wu; Kehuang Li; Minglei Yang; Chin-Hui Lee;
展开▼
作者单位

National Laboratory of Radar Signal Processing, Xidian University, Xi’an, China;

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA;

National Laboratory of Radar Signal Processing, Xidian University, Xi’an, China;

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech; Reverberation; Training; Context; Feature extraction; Speech processing;

机译：语音;混响;训练;语境;特征提取;语音处理;

相似文献

外文文献
中文文献
专利

1. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation [J] . Xiong Xiao, Shengkui Zhao, Duc Hoang Ha Nguyen, EURASIP journal on advances in signal processing . 2016,第1期

机译：使用动态特征增强和识别的语音去混响约束深度神经网络和特征自适应
2. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
3. Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification [J] . Zhaofeng Zhang, Longbiao Wang, Atsuhiko Kai, EURASIP journal on audio, speech, and music processing . 2015,第1期

机译：基于深度神经网络的瓶颈特征和基于去噪自动编码器的去混响用于远距离说话者识别
4. A maximum likelihood approach to deep neural network based speech dereverberation [C] . Xin Wang, Jun Du, Yannan Wang Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2017

机译：基于深度神经网络的语音去混响的最大似然方法
5. Wavelet transform approach for adaptive filtering with application to fuzzy neural network based speech recognition. [D] . Jung, Byung-Chul. 2001

机译：小波变换的自适应滤波方法及其在基于模糊神经网络的语音识别中的应用。
6. Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments [O] . Kyoungjin Noh, Joon-Hyuk Chang 2020

机译：基于深度神经网络的混响和波束成形的联合优化用于多通道环境中的声音事件检测
7. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation [O] . 2016

机译：使用动态特征增强和识别的语音去混响约束深度神经网络和特征自适应

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅