首页> 外文期刊>Computer speech and language >Domain compensation based on phonetically discriminative features for speaker verification
【24h】

Domain compensation based on phonetically discriminative features for speaker verification

机译:基于语音区分功能的域补偿,用于说话人验证

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new domain compensation framework by using phonetically discriminative features which are extracted from domain-dependent deep neural networks (DNNs). The domain compensation can be applied in both unsupervised and supervised manner, depending on whether the domain information of the development data is provided or not in advance. In supervised manner, the DNNs are trained on the development speech recordings of each given domain separately. While in the unsupervised manner, the development datasets are first automatically clustered into different domains, by using the Gaussian Mixture Model mean supervectors which are generated from each of the speech recordings, DNNs are then trained on the resulting clusters. Finally, we compensate the domain variabilities during the target speaker modeling step using support vector machines, by feeding in statistical vectors which are derived from the discriminative features extracted from the domain-dependent DNNs. The main strength of our proposed framework is that it does not need any speaker labels in the development dataset, which makes the proposed framework of great advantage over the state-of-the-art techniques that need speaker labels to train inter-speaker and/or intra-speaker variability models or channel compensation. Three speaker verification systems are investigated to examine the effectiveness of this new framework. Experimental results on the NIST SRE 2010 task demonstrate competitive performances to the state-of-the-art techniques in an initial implementation of the proposed framework.
机译:本文使用从领域相关的深度神经网络(DNN)中提取的语音区分特征,提出了一种新的领域补偿框架。根据是否预先提供了开发数据的域信息,可以以无监督和有监督两种方式应用域补偿。以监督的方式,分别在每个给定域的发展语音记录上对DNN进行训练。在无人监管的情况下,首先通过使用高斯混合模型的平均超向量自动将开发数据集聚到不同的域中,这些平均超向量是从每个语音记录中生成的,然后在所得的聚类上训练DNN。最后,我们使用支持向量机,通过输入统计向量来补偿目标说话人建模步骤中的域变异性,这些统计向量是从从依赖于域的DNN中提取的区分特征中得出的。我们提出的框架的主要优势在于,它在开发数据集中不需要任何说话者标签,这使得该提议的框架相对于需要说话者标签来训练演讲者和/或其他人的最新技术具有很大的优势。或扬声器内可变性模型或通道补偿。研究了三种说话人验证系统,以检查此新框架的有效性。 NIST SRE 2010任务的实验结果证明了在最初实施建议的框架中具有与最新技术相竞争的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号