Deep multi-metric learning for text-independent speaker verification

Xu Jiwei; Wang Xinggang; Feng Bin; Liu Wenyu

首页> 外文期刊>Neurocomputing >Deep multi-metric learning for text-independent speaker verification

【24h】

Deep multi-metric learning for text-independent speaker verification

机译：无关扬声器验证的深度多度量学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services. The purpose of text-independent speaker verification is to determine whether two given uncontrolled utterances originate from the same speaker or not. Extracting speech features for each speaker using deep neural networks is a promising direction to explore and a straightforward solution is to train the discriminative feature extraction network by using a metric learning loss function. However, a single loss function often has certain limitations. Thus, we use deep multi-metric learning to address the problem and introduce three different losses for this problem, i.e., triplet loss, n-pair loss and angular loss. The three loss functions work in a cooperative way to train a feature extraction network equipped with Residual connections and squeeze-and-excitation attention. We conduct experiments on the large-scale VoxCeleb2 dataset, which contains over a million utterances from over 6,000 speakers, and the proposed deep neural network obtains an equal error rate of 3.48%, which is a very competitive result. Codes for both training and testing and pretrained models are available at https://github.com/Greatjiweix/DmmltiSV, which is the first publicly available code repository for large-scale text- independent speaker verification with performance on par with the state-of-the-art systems. (C) 2020 Elsevier B.V. All rights reserved.

机译：文本无关的扬声器验证是一个重要的人工智能问题，具有广泛的应用，如刑事调查，支付认证和基于兴趣的客户服务。独立于文本的扬声器验证的目的是确定两个给定的不受控制的话语是否来自同一扬声器。利用深神经网络提取每个扬声器的语音特征是探索的有希望的方向，并且直接解决方案是通过使用度量学习损失函数训练鉴别特征提取网络。但是，单个损失函数通常具有一定的限制。因此，我们使用深度多度量学习来解决这个问题的三种不同的损失，即三重损耗，n次损耗和角度损失。三个损失函数以合作方式工作，培训配备残留连接和挤压和激励的特征提取网络。我们对大型VoxceleB2数据集进行实验，其中包含超过6,000名扬声器的百万分之一的话语，所提出的深度神经网络获得的相同错误率为3.48％，这是一个非常竞争力的结果。培训和测试和预磨模模型的代码可在https://github.com/greatjiweix/dmmltisv提供，这是一个用于大型文本独立扬声器验证的第一个公开的代码存储库，其性能与状态为单位 - - 艺术系统。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Neurocomputing》 |2020年第14期|394-400|共7页
作者
Xu Jiwei; Wang Xinggang; Feng Bin; Liu Wenyu;
展开▼
作者单位

Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker verification; N-pair loss; Angular loss; triplet loss; SENet;

机译：扬声器验证;n一对损失;角度损失;三重态损失;森卡;

相似文献

外文文献
中文文献
专利

1. Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification [J] . Wang Shuai, Huang Zili, Qian Yanmin, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第11期

机译：区分性神经嵌入学习用于短时文本无关的说话人验证
2. Text-independent speaker recognition based on adaptive course learning loss and deep residual network [J] . Zhong Qinghua, Dai Ruining, Zhang Han, EURASIP journal on advances in signal processing . 2021,第a期

机译：基于自适应课程学习损失和深度剩余网络的文本独立扬声器识别
3. Cross similarity measurement for speaker adaptive test normalization in text-independent speaker verification [J] . ZHAO Jian, DONG Yuan, ZHAO Xian-yu, 中国邮电高校学报（英文版） . 2008,第002期

机译：跨相似度测量，用于独立于文本的说话人验证中的说话人自适应测试标准化
4. Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification [C] . Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：基于部分AUC优化的深度演讲者嵌入和类中心学习，用于独立于文本的演讲者验证
5. Text-Independent Speaker Identification using Statistical Learning [D] . Ojutiku, Alli Ayoola. 2015

机译：使用统计学习的与文本无关的说话人识别
6. Deep learning-based smart speaker to confirm surgical sites for cataract surgeries: A pilot study [O] . Tae Keun Yoo, Ein Oh, Hong Kyu Kim, 2020

机译：基于深度学习的智能扬声器以确认白内障手术的外科遗址：试点研究
7. Deep multi-metric learning for text-independent speaker verification [O] . Jiwei Xu, Xinggang Wang, Bin Feng, 2020

机译：无关扬声器验证的深度多度量学习

Deep multi-metric learning for text-independent speaker verification

摘要

著录项

相似文献

相关主题

期刊订阅