Variational DNN embeddings for text-independent speaker verification

Pinheiro Hector N. B.; Ren Tsang Ing; Adami Andre G.; Cavalcanti George D. C.

首页> 外文期刊>Pattern recognition letters >Variational DNN embeddings for text-independent speaker verification

【24h】

Variational DNN embeddings for text-independent speaker verification

机译：变形DNN嵌入文本独立扬声器验证

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In state-of-the-art text-independent speaker verification systems, a discriminative deep neural network (DNN) model learns speaker-discriminative representations (x-vectors) for utterances using labeled data. For the verification task, a Probabilistic Linear Discriminant Analysis (PLDA) model is used to decide whether two x-vectors come from the same speaker. The PLDA scoring model assumes Gaussian priors and conditional distributions across speakers, which is not the case for x-vectors. This work introduces a variational-based regularization term that encourages the network to generate embeddings that follow a desired prior distribution. The regularization function performs a non-parametric match between the embeddings generated in the mini-batch training and a sample from the desired distribution. Unlike Variational Auto Encoders (VAEs), no distribution parameter is necessary to be learned, and no sampling schema is employed, which makes the proposed method flexible for different desired distributions. Our experiments compared the proposed method with the standard x-vectors system jointly with other approaches recently proposed to generate Gaussianized representations. We assessed the verification performance of the systems using the Fisher English Training Speech Part II database in eight test conditions based on the gender of the speakers and the duration of the speech segments. Besides using the standard Gaussian distribution as prior, we also applied a less strict distribution (uniform). The proposed method outperformed others in all test conditions for both distributions, with gains of performance between 7.68% and 20.49% when compared to the standard x-vectors. To understand the effect of the regularizations into the embeddings space, we also conducted a 2-dimensional visual comparison, which showed clusters with better quality when the regularizations were applied. (c) 2021 Elsevier B.V. All rights reserved.

机译：在最先进的文本 - 独立的扬声器验证系统中，判别深度神经网络（DNN）模型学习使用标记数据的话语表示扬声器 - 鉴别符号（X型载体）。对于验证任务，概率的线性判别分析（PLDA）模型用于决定两个X型矢量是否来自同一扬声器。 PLDA评分模型假设跨扬声器的高斯前锋和有条件的分布，这不是X载体的情况。这项工作引入了基于变化的正则化术语，鼓励网络生成遵循所需的先前分发的嵌入。正则化函数在迷你批量培训中生成的嵌入物和所需分布的样本之间执行非参数匹配。与变分别自动编码器（VAES）不同，不需要学习分发参数，并且没有采用采样模式，这使得提出的方法对于不同的所需分布。我们的实验将拟议的方法与最近提出的其他方法与最近提出的其他方法进行了比较，以产生高斯方式。我们根据演讲者的性别和语音段的持续时间，在八个测试条件下评估了使用Fisher英语培训语音部分数据库的系统的验证性能。除了使用标准高斯分布如之前，我们还应用了不太严格的分布（统一）。该方法在两个分布的所有测试条件下表现出其他的其他方法，与标准X载体相比，性能的增长率为7.68％和20.49％。要了解规范化进入嵌入空间的效果，我们还进行了二维视觉比较，当应用规则化时，显示出具有更好质量的簇。（c）2021 elestvier b.v.保留所有权利。

著录项

来源
《Pattern recognition letters》 |2021年第8期|100-106|共7页
作者
Pinheiro Hector N. B.; Ren Tsang Ing; Adami Andre G.; Cavalcanti George D. C.;
展开▼
作者单位

Univ Fed Pernambuco UFPE Recife PE Brazil;

Univ Fed Pernambuco UFPE Recife PE Brazil;

Univ Caxias Sul UCS Caxias Do Sul RS Brazil;

Univ Fed Pernambuco UFPE Recife PE Brazil;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker recognition; Speaker verification; Deep Neural Networks;

机译：扬声器识别;发言者验证;深神经网络;

相似文献

外文文献
中文文献
专利

1. Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification [J] . Wang Shuai, Huang Zili, Qian Yanmin, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第11期

机译：区分性神经嵌入学习用于短时文本无关的说话人验证
2. Text-Independent Speaker Verification Using Variational Gaussian Mixture Model [J] . Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour ETRI journal . 2011,第6期

机译：使用变分高斯混合模型的与文本无关的说话人验证
3. Text-Independent Speaker Verification Using Variational Gaussian Mixture Model [J] . Mohammad Hossein Moattar, Mohammad Mehdi Homayounpour ETRI journal . 2011,第6期

机译：使用变分高斯混合模型的与文本无关的说话人验证
4. Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification [C] . Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：基于部分AUC优化的深度演讲者嵌入和类中心学习，用于独立于文本的演讲者验证
5. DNN Based Speaker Recognition System [D] . Song Hangyu 2020

机译：基于DNN的说话人识别系统
6. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [O] . Soo Jin Park, Gary Yeung, Neda Vesselinova, -1

机译：旨在理解人和机器中说话者的辨别能力以实现不同语音风格的与文本无关的简短发声
7. DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis [O] . Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari 2019

机译：基于DNN的扬声器使用主观讲话者相似性，用于语音合成中的多扬声器建模

Variational DNN embeddings for text-independent speaker verification

摘要

著录项

相似文献

相关主题

期刊订阅