...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning
【24h】

Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning

机译:通过话语分割提高基于I矢量的说话人验证性能

获取原文
获取原文并翻译 | 示例
           

摘要

The success of the recent i-vector approach to speaker verification relies on the capability of i-vectors to capture speaker characteristics and the subsequent channel compensation methods to suppress channel variability. Typically, given an utterance, an i-vector is determined from the utterance regardless of its length. This paper investigates how the utterance length affects the discriminative power of i-vectors and demonstrates that the discriminative power of i-vectors reaches a plateau quickly when the utterance length increases. This observation suggests that it is possible to make the best use of a long conversation by partitioning it into a number of sub-utterances so that more i-vectors can be produced for each conversation. To increase the number of sub-utterances without scarifying the representation power of the corresponding i-vectors, repeated applications of frame-index randomization and utterance partitioning are performed. Results on NIST 2010 speaker recognition evaluation (SRE) suggest that (1) using more i-vectors per conversation can help to find more robust linear discriminant analysis (LDA) and within-class covariance normalization (WCCN) transformation matrices, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based support vector machines (SVM) to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 19% and 9% in terms of minimum normalized DCF and EER.
机译:最近的i-vector说话人验证方法的成功取决于i-vector捕捉说话人特征的能力以及随后的通道补偿方法来抑制通道变化。通常,给定发声,无论发声的长度如何,都从发声确定i向量。本文研究了话语长度如何影响i向量的判别能力,并证明了当言语长度增加时,i向量的判别能力会迅速达到平稳状态。该观察结果表明,可以通过将长对话划分为多个子话语来充分利用长对话,从而可以为每个对话生成更多的i-vector。为了增加子话语的数量而不牺牲相应i向量的表示能力,执行了帧索引随机化和话语划分的重复应用。 NIST 2010说话者识别评估(SRE)的结果表明(1)在每次对话中使用更多i-vector可以帮助找到更鲁棒的线性判别分析(LDA)和类内协方差归一化(WCCN)转换矩阵,尤其是当每个培训讲者的对话次数有限; (2)增加每个目标说话者的i向量的数量有助于基于i向量的支持向量机(SVM)找到更好的决策边界,从而使SVM评分的余弦距离评分比最小余弦评分高19%和9%归一化DCF和EER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号