首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >PLDA FOR SPEAKER VERIFICATION WITH UTTERANCES OF ARBITRARY DURATION
【24h】

PLDA FOR SPEAKER VERIFICATION WITH UTTERANCES OF ARBITRARY DURATION

机译:PLDA用于扬声器验证,具有任意持续时间的话语

获取原文

摘要

The duration of speech segments has traditionally been controlled in the NIST speaker recognition evaluations so that researchers working in this framework have been relieved of the responsibility of dealing with the duration variability that arises in practical applications. The fixed dimensional i-vector representation of speech utterances is ideal for working under such controlled conditions and ignoring the fact that i-vectors extracted from short utterances are less reliable than those extracted from long utterances leads to a very simple formulation of the speaker recognition problem. However a more realistic approach seems to be needed to handle duration variability properly. In this paper, we show how to quantify the uncertainty associated with the i-vector extraction process and propagate it into a PLDA classifier. We evaluated this approach using test sets derived from the NIST 2010 core and extended core conditions by randomly truncating the utterances in the female, telephone speech trials so that the durations of all enrollment and test utterances lay in the range 3-60 seconds and we found that it led to substantial improvements in accuracy. Although the likelihood ratio computation for speaker verification is more computationally expensive than in the standard i-vector/PLDA classifier, it is still quite modest as it reduces to computing the probability density functions of two full covariance Gaussians (irrespective of the number of the number of utterances used to enroll a speaker).
机译:语音段的持续时间传统上被控制在NIST扬声器识别评估中,以便在本框架中工作的研究人员已经解除了处理实际应用中出现的持续时间变异性的责任。语音话语的固定尺寸I形式矢量表示是在这种受控条件下工作的理想选择,并且忽略从短话道中提取的I载体的事实不如从长型话道提取的那些导致扬声器识别问题的非常简单的制定。然而,似乎需要更现实的方法来处理持续时间变化。在本文中,我们展示了如何量化与I - 矢量提取过程相关的不确定性,并将其传播到PLDA分类器中。我们使用从NIST 2010核心和扩展核心条件中的测试集进行了评估了这种方法,通过随机截断了女性,电话语音试验,使得所有入学和测试话语的持续时间在3-60秒的范围内,我们发现它的准确性导致了大量的改善。虽然扬声器验证的似然比计算比标准I形载体/ PLDA分类器更昂贵,但它仍然非常适度,因为它减少了计算两个完整协方差高斯的概率密度函数(无论数字的数量不论数量用来注册扬声器的话语)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号