首页> 外文期刊>Computer speech and language >rVAD: An unsupervised segment-based robust voice activity detection method
【24h】

rVAD: An unsupervised segment-based robust voice activity detection method

机译:rVAD:一种基于无监督的基于段的鲁棒语音活动检测方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. We evaluate the VAD performance of the proposed method using two databases, RATS and Aurora-2, which contain a large variety of noise conditions. The rVAD method is further evaluated, in terms of speaker verification performance, on the RedDots 2016 challenge database and its noise-corrupted versions. Experiment results show that rVAD is compared favourably with a number of existing methods. In addition, we present a modified version of rVAD where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation. The modified version significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices. The source code of rVAD is made publicly available. (C) 2019 Elsevier Ltd. All rights reserved.
机译:本文提出了一种用于鲁棒语音活动检测(rVAD)的基于无监督分段的方法。该方法包括两次去噪,然后是语音活动检测(VAD)阶段。在第一遍中,通过使用后验信噪比(SNR)加权能量差来检测语音信号中的高能量片段,并且如果在片段中未检测到音调,则将该片段视为高能量噪声段并将其设置为零。在第二遍中,通过语音增强方法对语音信号进行去噪,为此探讨了几种方法。接下来,将具有音高的相邻帧组合在一起以形成音高段,并且基于语音统计,将音高段从两端进一步扩展,以便同时包括浊音和清音以及可能的非语音部分。最后,将后验SNR加权能量差应用于去噪语音信号的扩展音高段,以检测语音活动。我们使用两个数据库(RATS和Aurora-2)评估了所提出方法的VAD性能,这两个数据库包含多种噪声条件。在RedDots 2016挑战数据库及其噪声损坏的版本上,将根据说话者验证性能进一步评估rVAD方法。实验结果表明,rVAD与许多现有方法相比具有优势。另外,我们提出了rVAD的修改版本,其中计算强度大的基音提取被计算效率高的频谱平坦度计算所取代。修改后的版本以适度劣势的VAD性能为代价,大大降低了计算复杂性,这在处理大量数据并在低资源设备上运行时具有优势。 rVAD的源代码是公开可用的。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号