首页> 外文期刊>Journal of Zhejiang university science >Mismatched feature detection with finer granularity for emotional speaker recognition
【24h】

Mismatched feature detection with finer granularity for emotional speaker recognition

机译:不匹配的特征检测和更精细的粒度,可实现情感说话者识别

获取原文
           

摘要

The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.
机译:说话人声音器官的形状在其不同的情绪状态下会发生变化,这导致短时特征的情绪声音空间与中性声音空间发生偏离,从而导致说话人识别性能下降。偏离中性声学空间的特征被认为是不匹配的特征,它们会对说话人识别系统产生负面影响。情绪变化会为不同的音素产生不同的特征变形,因此建立一个更好的模型来检测每个音素下不匹配的特征是合理的。但是,鉴于音素识别的困难,提出了三种声学类别识别(音素类别,高斯混合模型(GMM)标记器和概率GMM标记器)来代替音素识别。我们提出了特征修剪和特征调节方法来处理不匹配的特征,以提高说话人识别性能。对于特征调整方法,采用最大化类间距离并最小化类内距离的策略来训练变换矩阵以调节失配特征。对普通话情感语料库(MASC)进行的实验表明,与基线GMM-UBM(通用背景模型)算法相比,我们的特征修剪和特征调节方法将识别率(IR)提高了3.64%和6.77%。同样,当将我们的方法应用于最新算法i-vector时,可以分别获得2.09%和3.32%的IR增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号