...
首页> 外文期刊>Neurocomputing >Deep multi-metric learning for text-independent speaker verification
【24h】

Deep multi-metric learning for text-independent speaker verification

机译:无关扬声器验证的深度多度量学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services. The purpose of text-independent speaker verification is to determine whether two given uncontrolled utterances originate from the same speaker or not. Extracting speech features for each speaker using deep neural networks is a promising direction to explore and a straightforward solution is to train the discriminative feature extraction network by using a metric learning loss function. However, a single loss function often has certain limitations. Thus, we use deep multi-metric learning to address the problem and introduce three different losses for this problem, i.e., triplet loss, n-pair loss and angular loss. The three loss functions work in a cooperative way to train a feature extraction network equipped with Residual connections and squeeze-and-excitation attention. We conduct experiments on the large-scale VoxCeleb2 dataset, which contains over a million utterances from over 6,000 speakers, and the proposed deep neural network obtains an equal error rate of 3.48%, which is a very competitive result. Codes for both training and testing and pretrained models are available at https://github.com/Greatjiweix/DmmltiSV, which is the first publicly available code repository for large-scale text- independent speaker verification with performance on par with the state-of-the-art systems. (C) 2020 Elsevier B.V. All rights reserved.
机译:文本无关的扬声器验证是一个重要的人工智能问题,具有广泛的应用,如刑事调查,支付认证和基于兴趣的客户服务。独立于文本的扬声器验证的目的是确定两个给定的不受控制的话语是否来自同一扬声器。利用深神经网络提取每个扬声器的语音特征是探索的有希望的方向,并且直接解决方案是通过使用度量学习损失函数训练鉴别特征提取网络。但是,单个损失函数通常具有一定的限制。因此,我们使用深度多度量学习来解决这个问题的三种不同的损失,即三重损耗,n次损耗和角度损失。三个损失函数以合作方式工作,培训配备残留连接和挤压和激励的特征提取网络。我们对大型VoxceleB2数据集进行实验,其中包含超过6,000名扬声器的百万分之一的话语,所提出的深度神经网络获得的相同错误率为3.48%,这是一个非常竞争力的结果。培训和测试和预磨模模型的代码可在https://github.com/greatjiweix/dmmltisv提供,这是一个用于大型文本独立扬声器验证的第一个公开的代码存储库,其性能与状态为单位 - - 艺术系统。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2020年第14期|394-400|共7页
  • 作者单位

    Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

    Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan 430074 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Speaker verification; N-pair loss; Angular loss; triplet loss; SENet;

    机译:扬声器验证;n一对损失;角度损失;三重态损失;森卡;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号