首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Speaker Identification Within Whispered Speech Audio Streams
【24h】

Speaker Identification Within Whispered Speech Audio Streams

机译:低语语音音频流中的说话人识别

获取原文
获取原文并翻译 | 示例

摘要

Whisper is an alternative speech production mode used by subjects in natural conversation to protect the privacy. Due to the profound differences between whisper and neutral speech in both excitation and vocal tract function, the performance of speaker identification systems trained with neutral speech degrades significantly. In this paper, a seamless neutral/whisper mismatched closed-set speaker recognition system is developed. First, performance characteristics of a neutral trained closed-set speaker ID system based on an Mel-frequency cepstral coefficient-Gaussian mixture model (MFCC-GMM) framework is considered. It is observed that for whisper speaker recognition, performance degradation is concentrated for only a subset of speakers. Next, it is shown that the performance loss for speaker identification in neutral/whisper mismatched conditions is focused on phonemes other than low-energy unvoiced consonants. In order to increase system performance for unvoiced consonants, an alternative feature extraction algorithm based on linear and exponential frequency scales is applied. The acoustic properties of misrecognized and correctly recognized whisper are analyzed in order to develop more effective processing schemes. A two-dimensional feature space is proposed in order to predict on which whispered utterances the system will perform poorly, with evaluations conducted to measure the quality of whispered speech. Finally, a system for seamless neutral/whisper speaker identification is proposed, resulting in an absolute improvement of 8.85%-10.30% for speaker recognition, with the best closed set speaker ID performance of 88.35% obtained for a total of 961 read whisper test utterances, and 83.84% using a total of 495 spontaneous whisper test utterances.
机译:悄悄话是主体在自然对话中用来保护隐私的另一种语音表达方式。由于耳语和中性语音在激励和声道功能方面的巨大差异,使用中性语音训练的说话人识别系统的性能会大大降低。本文开发了一种无缝的中性/耳语不匹配的闭口说话人识别系统。首先,考虑了基于梅尔频率倒谱系数-高斯混合模型(MFCC-GMM)框架的中性训练的封闭式讲话者ID系统的性能特征。可以观察到,对于耳语说话者识别,性能下降仅集中于一部分说话者。接下来,表明在中性/耳语不匹配条件下说话者识别的性能损失主要集中在低能量清音辅音以外的音素上。为了提高清音辅音的系统性能,应用了基于线性和指数频率标度的替代特征提取算法。分析误识别和正确识别的耳语的声学特性,以开发更有效的处理方案。提出了一个二维特征空间,以预测系统在哪些低语语音上表现不佳,并进行评估以测量低语语音的质量。最后,提出了一种用于中立/低语说话者无缝识别的系统,从而使说话者识别度绝对提高了8.85%-10.30%,对于总共961项读取的耳语测试发音,其最佳封闭设置说话者ID性能达到了88.35%。和83.84%的声音,总共使用了495项自发耳语测试语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号