首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Duration mismatch compensation for i-vector based speaker recognition systems
【24h】

Duration mismatch compensation for i-vector based speaker recognition systems

机译:基于i向量的说话人识别系统的持续时间不匹配补偿

获取原文

摘要

Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased, number of detected unique phonemes and i-vector length approaches zero in a logarithmic and non-linear fashion, respectively. Assuming duration variability as an additive noise in the i-vector space, we propose three different strategies for its compensation: i) multi-duration training in Probabilistic Linear Discriminant Analysis (PLDA) model, ii) score calibration using log duration as a Quality Measure Function (QMF), and iii) multi-duration PLDA training with synthesized short duration i-vectors. Experiments are designed based on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) protocol with varying test utterance duration. Experimental results demonstrate the effectiveness of the proposed schemes on short duration test conditions, especially with the QMF calibration approach.
机译:众所周知,经过长时间话语训练的说话人识别系统在遇到较短的测试片段时会表现得很差。为了解决这种不匹配问题,我们分析了持续时间变化对语音发声和i-vector长度的音素分布的影响。我们证明,随着发声持续时间的减少,检测到的唯一音素和i向量长度的数量分别以对数和非线性方式接近零。假设持续时间可变性是i向量空间中的附加噪声,我们提出了三种不同的补偿策略:i)概率线性判别分析(PLDA)模型中的多持续时间训练,ii)使用对数持续时间作为质量度量的评分校准功能(QMF),以及iii)使用合成的短期i-vector进行多持续时间PLDA训练。实验是根据2012年美国国家标准技术研究院(NIST)的说话者识别评估(SRE)协议设计的,测试说话的持续时间各不相同。实验结果证明了该方案在短期测试条件下的有效性,尤其是在QMF校准方法下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号