首页> 外文会议>International Conference on Instrumentation, Measurement, Circuits and Systems >Comparison of Frequency-Warped Filter Banks in relation to Robust Features for Speaker Identification
【24h】

Comparison of Frequency-Warped Filter Banks in relation to Robust Features for Speaker Identification

机译:频率扭曲滤波器银行与扬声器识别的鲁棒功能的比较

获取原文

摘要

Use of psycho-acoustically motivated warping such as mel-scale warping in common in speaker recognition task, which was first applied for speech recognition. The mel-warped cepstral coefficients (MFCCs) have been used in state-of-art speaker recognition system as a standard acoustic feature set. Alternate frequency warping techniques such as Bark and ERB rate scale can have comparable performance to mel-scale warping. In this paper the performance acoustic features generated using filter banks with Bark and ERB rate warping is investigated in relation to robust features for speaker identification. For this purpose, a sensor mismatched database is used for closed set text-dependent and text-independent cases. As MFCCs are much sensitive to mismatched conditions (any type of mismatch of data used for training evaluation purpose), in order to reduce the additive noise, spectral subtraction is performed on mismatched speech data. Also normalization of feature vectors is carried out over each frame, to compensate for channel mismatch. Experimental analysis shows that, percentage identification rate for text-dependent case using mel, bark and ERB warped filter banks is comparably same in mismatched conditions. However, in case of text-independent speaker identification, ERB rate warped filter bank features shows improved performance than mel and bark warped features for the same sensor mismatched condition. Also it is observed that, without any compensation (spectral subtraction or cepstral normalization) bark warped filter bank features, shows somewhat superior speaker identification results for both text-dependent and text-independent case.
机译:在扬声器识别任务中使用心理学上动力的翘曲,如扬声器识别任务中的常见翘曲,这是首次申请语音识别。作为标准声学特征集,MEL-扭曲的抗搏动系数(MFCC)已被用于最先进的扬声器识别系统。替代频率翘曲技术,如树皮和ERB速率比例可以对熔化尺度翘曲具有相当的性能。本文在扬声器识别方面,研究了使用具有树皮和ERB速率翘曲的滤波器组产生的性能声学特征。为此目的,传感器不匹配数据库用于关闭集文本相关和无关的案例。由于MFCC对错配的条件(用于训练评估目的的数据的任何类型的不匹配)来说,为了降低添加剂噪声,谱减法是对错配的语音数据进行的。还在每个框架上执行特征向量的标准化,以补偿信道不匹配。实验分析表明,使用MEL,BARK和ERB扭曲滤波器组的文本依赖案例的百分比识别率在不匹配的条件下相对相同。然而,如果是文本无关的扬声器识别,ERB率翘曲滤波器组件的特性显示出比MEL和BARK翘曲特征的改进性能,同样的传感器错配。此外,观察到,没有任何补偿(光谱减法或谱归一化)Bark扭曲滤波器组特征,对文本相关和无关的情况显示一些优异的扬声器识别结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号