首页> 外文期刊>Signal, Image and Video Processing >Feature classification criterion for missing features mask estimation in robust speaker recognition - Springer
【24h】

Feature classification criterion for missing features mask estimation in robust speaker recognition - Springer

机译:健壮的说话人识别中缺少特征蒙版估计的特征分类标准-Springer

获取原文
获取原文并翻译 | 示例
           

摘要

Currently, many speaker recognition applications must handle speech corrupted by environmental additive noise without having a priori knowledge about the characteristics of noise. Some previous works in speaker recognition have used the missing feature (MF) approach to compensate for noise. In most of those applications, the spectral reliability decision step is performed using the signal to noise ratio (SNR) criterion, which attempts to directly measure the relative signal to noise energy at each frequency. An alternative approach to spectral data reliability has been used with some success in the MF approach to speech recognition. Here, we compare the use of this new criterion with the SNR criterion for MF mask estimation in speaker recognition. The new reliability decision is based on the extraction and analysis of several spectro-temporal features from across the entire speech frame, but not across the time, which highlight the differences between spectral regions dominated by speech and by noise. We call it the feature classification (FC) criterion. It uses several spectral features to establish spectrogram reliability unlike SNR criterion that relies only in one feature: SNR. We evaluated our proposal through speaker verification experiments, in Ahumada speech database corrupted by different types of noise at various SNR levels. Experiments demonstrated that the FC criterion achieves considerably better recognition accuracy than the SNR criterion in the speaker verification tasks tested.
机译:当前,许多说话者识别应用程序必须处理被环境加性噪声破坏的语音,而没有关于噪声特性的先验知识。说话人识别方面的一些先前工作已使用缺失特征(MF)方法来补偿噪声。在大多数这些应用中,使用信噪比(SNR)准则执行频谱可靠性决策步骤,该准则试图直接测量每个频率下的相对信噪比能量。频谱数据可靠性的另一种方法已在MF语音识别方法中获得成功。在这里,我们将这个新标准与SNR标准进行比较,以进行说话人识别中的MF掩模估计。新的可靠性决策基于从整个语音帧(而不是整个时间)中对多个频谱时域特征的提取和分析,这些特征突出了语音和噪声主导的频谱区域之间的差异。我们称其为特征分类(FC)标准。它使用几种频谱特征来建立频谱图可靠性,这与仅依赖于一个特征的SNR标准不同:SNR。我们通过Ahumada语音数据库中的扬声器验证实验评估了我们的建议,该数据库受到了各种SNR级别的不同类型噪声的破坏。实验表明,在测试的说话人验证任务中,FC准则比SNR准则具有更好的识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号