首页> 外文会议>Odyssey 2010: the speaker and language recognition workshop >Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification
【24h】

Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification

机译:基于声带长度和MLLR矩阵的说话人聚类UBM用于说话人验证的研究

获取原文
获取原文并翻译 | 示例

摘要

It is common to use a single speaker independent large Gaussian Mixture Model based Universal Background Model (GMM-UBM) as the alternative hypothesis for speaker verification tasks. The speaker models are themselves derived from the UBM using Maximum a Posteriori (MAP) adaptation technique. During verification, log likelihood ratio is calculated between the target model and the GMM-UBM to accept or reject the claimant. The use of a single UBM for different groups of population may not be appropriate especially when the impostors are close to the target speaker. In this paper, we investigate the use of Speaker Cluster-wise UBM (SC-UBM) for a group of target speakers based on two different similarity measures. In the first approach, speakers are grouped into different clusters depending on their Vocal Tract Lengths (VTLs). The group of speakers having same VTL parameter indicates similarity in vocal-tract geometry and constitutes a speaker-dependent characteristic. In the second approach, we use Maximum Likelihood Linear Regression (MLLR) matrices of target speakers to create MLLR super-vectors and use them to cluster speakers into different groups. The SC-UBMs are derived from GMM-UBM using MLLR adaptation using data from the corresponding group of target speakers. Finally, speaker dependent models are adapted from their respective SC-UBM using MAP. In the proposed method, log likelihood ratio is calculated between target model and its corresponding SC-UBM. We compare performance of the above method with the single UBM method for varying number of clusters. The experiments are performed on the NIST 2004 SRE core condition and we show that the proposed method with a slight increase in the number of UBMs always outperforms the conventional single GMM-UBM system.
机译:通常使用单个与说话者无关的大型高斯混合模型基础通用背景模型(GMM-UBM)作为说话者验证任务的替代假设。扬声器模型本身是使用最大后验(MAP)自适应技术从UBM派生的。在验证期间,将计算目标模型与GMM-UBM之间的对数似然比,以接受或拒绝索赔人。对于不同的人群使用单个UBM可能不合适,尤其是当冒名顶替者靠近目标人群时。在本文中,我们基于两种不同的相似性度量,研究了针对一组目标说话者的说话者聚类UBM(SC-UBM)。在第一种方法中,根据说话人的声带长度(VTL)将说话者分为不同的组。具有相同VTL参数的一组扬声器表示声道几何形状的相似性,并构成了与扬声器相关的特性。在第二种方法中,我们使用目标说话人的最大似然线性回归(MLLR)矩阵来创建MLLR超向量,并使用它们将说话人聚类为不同的组。 SC-UBM是使用来自相应目标扬声器组的数据通过MLLR适配从GMM-UBM派生而来的。最后,使用MAP,从依赖于说话者的模型从其各自的SC-UBM进行改编。在提出的方法中,计算目标模型与其对应的SC-UBM之间的对数似然比。我们将上述方法的性能与单个UBM方法针对不同数量的簇的性能进行了比较。在NIST 2004 SRE核心条件下进行了实验,结果表明,所提出的方法在UBM数量略有增加的情况下始终优于传统的单个GMM-UBM系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号