首页> 外文期刊>Computer speech and language >A study of voice activity detection techniques for NIST speaker recognition evaluations
【24h】

A study of voice activity detection techniques for NIST speaker recognition evaluations

机译:用于NIST说话人识别评估的语音活动检测技术的研究

获取原文
获取原文并翻译 | 示例
           

摘要

Since 2008, interview-style speech has become an important part of the NIST speaker recognition evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speechon-speech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a pre-processing ste'p for enhancing the reliability of energy-based and statistical-model-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. The proposed VAD is compared with the ASR transcripts provided by NIST, VAD in the ETSI-AMR Option 2 coder, satistical-model (SM) based VAD, and Gaussian mixture model (GMM) based VAD. Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms these conventional ones whenever interview-style speech is involved. This study also demonstrates that (1) noise reduction is vital for energy-based VAD under low SNR; (2) the ASR transcripts and ETSI-AMR speech coder do not produce accurate speech and non-speech segmentations; and (3) spectral subtraction makes better use of background spectra than the likelihood-ratio tests in the SM-based VAD.
机译:自2008年以来,访谈式演讲已成为NIST演讲者识别评估(SRE)的重要组成部分。与电话语音不同,采访语音具有较低的信噪比,因此需要强大的语音活动检测器(VAD)。本文重点介绍了NIST SRE中采访语音文件的特征,并讨论了在这些文件中执行语音/非语音分割的困难。为了克服这些困难,本文提出使用语音增强技术作为预处理步骤来增强基于能量和基于统计模型的VAD的可靠性。还提出了一种决策策略来克服由脉冲信号和正弦背景信号引起的不良影响。将拟议的VAD与NIST提供的ASR成绩单,ETSI-AMR选项2编码器中的VAD,基于状态模型(SM)的VAD和基于高斯混合模型(GMM)的VAD进行了比较。基于NIST 2010 SRE数据集的实验结果表明,无论何时涉及面试风格的语音,建议的VAD都要优于这些常规VAD。这项研究还表明(1)降噪对于低SNR下基于能量的VAD至关重要; (2)ASR笔录和ETSI-AMR语音编码器无法产生准确的语音和非语音分割; (3)光谱减法比基于SM的VAD中的似然比检验更好地利用了背景光谱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号