首页> 外文期刊>Pattern recognition letters >Variational DNN embeddings for text-independent speaker verification
【24h】

Variational DNN embeddings for text-independent speaker verification

机译:变形DNN嵌入文本独立扬声器验证

获取原文
获取原文并翻译 | 示例
           

摘要

In state-of-the-art text-independent speaker verification systems, a discriminative deep neural network (DNN) model learns speaker-discriminative representations (x-vectors) for utterances using labeled data. For the verification task, a Probabilistic Linear Discriminant Analysis (PLDA) model is used to decide whether two x-vectors come from the same speaker. The PLDA scoring model assumes Gaussian priors and conditional distributions across speakers, which is not the case for x-vectors. This work introduces a variational-based regularization term that encourages the network to generate embeddings that follow a desired prior distribution. The regularization function performs a non-parametric match between the embeddings generated in the mini-batch training and a sample from the desired distribution. Unlike Variational Auto Encoders (VAEs), no distribution parameter is necessary to be learned, and no sampling schema is employed, which makes the proposed method flexible for different desired distributions. Our experiments compared the proposed method with the standard x-vectors system jointly with other approaches recently proposed to generate Gaussianized representations. We assessed the verification performance of the systems using the Fisher English Training Speech Part II database in eight test conditions based on the gender of the speakers and the duration of the speech segments. Besides using the standard Gaussian distribution as prior, we also applied a less strict distribution (uniform). The proposed method outperformed others in all test conditions for both distributions, with gains of performance between 7.68% and 20.49% when compared to the standard x-vectors. To understand the effect of the regularizations into the embeddings space, we also conducted a 2-dimensional visual comparison, which showed clusters with better quality when the regularizations were applied. (c) 2021 Elsevier B.V. All rights reserved.
机译:在最先进的文本 - 独立的扬声器验证系统中,判别深度神经网络(DNN)模型学习使用标记数据的话语表示扬声器 - 鉴别符号(X型载体)。对于验证任务,概率的线性判别分析(PLDA)模型用于决定两个X型矢量是否来自同一扬声器。 PLDA评分模型假设跨扬声器的高斯前锋和有条件的分布,这不是X载体的情况。这项工作引入了基于变化的正则化术语,鼓励网络生成遵循所需的先前分发的嵌入。正则化函数在迷你批量培训中生成的嵌入物和所需分布的样本之间执行非参数匹配。与变分别自动编码器(VAES)不同,不需要学习分发参数,并且没有采用采样模式,这使得提出的方法对于不同的所需分布。我们的实验将拟议的方法与最近提出的其他方法与最近提出的其他方法进行了比较,以产生高斯方式。我们根据演讲者的性别和语音段的持续时间,在八个测试条件下评估了使用Fisher英语培训语音部分数据库的系统的验证性能。除了使用标准高斯分布如之前,我们还应用了不太严格的分布(统​​一)。该方法在两个分布的所有测试条件下表现出其他的其他方法,与标准X载体相比,性能的增长率为7.68%和20.49%。要了解规范化进入嵌入空间的效果,我们还进行了二维视觉比较,当应用规则化时,显示出具有更好质量的簇。 (c)2021 elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号