首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Speaker identification from shouted speech: Analysis and compensation
【24h】

Speaker identification from shouted speech: Analysis and compensation

机译:从大声说话中识别说话人:分析和补偿

获取原文

摘要

Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.
机译:研究人员使用芬兰语中立和高喊的语音来研究与文本无关的说话人识别,以分析训练和测试发声之间语音模式不匹配的影响。具有高斯混合模型(GMM)识别器的标准梅尔频率倒谱系数(MFCC)功能用于说话人识别。结果表明,在语音模式不匹配的情况下,说话人识别准确度从完美(100%)降低到8.71%。由于识别准确度的急剧下降,我们建议使用联合密度GMM映射技术来补偿MFCC特征。此映射在不相交的情感语音语料库上进行训练,以创建完全独立于说话者和语音模式的情感中和映射。补偿的结果是,8.71%的识别精度提高到32.00%,而不会大大降低不匹配的列车测试条件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号