首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments
【24h】

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

机译:噪声混响环境中基于调制传递函数的鲁棒语音活动检测

获取原文
获取原文并翻译 | 示例

摘要

Voice activity detection (VAD) is used to detect speech and non-speech periods from observed speech signals. It is an important front-end technique for many speech technology applications. Many VAD methods have been proposed. However most of them have been applied under clean or noisy conditions. Only a few methods have been proposed for reverberant conditions, particularly under noisy reverberant conditions. We therefore need to understand the ill effects of noise and reverberation on speech to design an accurate and robust method of VAD under noisy reverberant conditions. The ill effects of noise and reverberation for speech can be regarded as the modulation transfer function (MTF) under noisy and reverberant conditions. Therefore, our study is based on the MTF concept to reduce the ill effects of noise and reverberation on speech, and propose a robust VAD method that we obtained in this study. Noise reduction and dereverberation were first applied to the temporal power envelope of the speech signal to restore the temporal power envelope with this method. Then, power thresholding as a VAD decision was designed based on the restored temporal power envelope. A method of estimating the signal to noise ratio (SNR) was proposed to accurately estimate the SNR in the noise reduction stage. Experiments under both artificial and realistic noisy reverberant conditions were carried out to evaluate the performance of the proposed method of VAD and it was compared with conventional VAD methods. The results revealed that the proposed method significantly outperformed the conventional methods under artificial and realistic noisy reverberant conditions.
机译:语音活动检测(VAD)用于从观察到的语音信号中检测语音和非语音时段。对于许多语音技术应用来说,它是一种重要的前端技术。已经提出了许多VAD方法。但是,它们大多数是在干净或嘈杂的条件下使用的。对于混响条件,尤其是在嘈杂的混响条件下,仅提出了几种方法。因此,我们需要了解噪声和混响对语音的不良影响,以设计一种在嘈杂的混响条件下准确而可靠的VAD方法。噪声和混响对语音的不良影响可以看作是在嘈杂和混响条件下的调制传递函数(MTF)。因此,我们的研究基于MTF概念,以减少噪声和混响对语音的不良影响,并提出了一种可靠的VAD方法。首先将降噪和去混响应用于语音信号的时间功率包络,以使用该方法恢复时间功率包络。然后,基于恢复的时间功率包络设计功率阈值作为VAD决策。提出了一种估计信噪比(SNR)的方法,以在降噪阶段准确估计SNR。进行了人工和现实噪声混响条件下的实验,以评估所提出的VAD方法的性能,并将其与常规VAD方法进行比较。结果表明,该方法在人工和现实的嘈杂混响条件下明显优于传统方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号