Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition

Omid Ghahabi; Javier Hernando

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition

【24h】

Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition

机译：用于单会话和多会话i-Vector说话人识别的深度学习后端

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The lack of labeled background data makes a big performance gap between cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring baseline techniques for i-vectors in speaker recognition. Although there are some unsupervised clustering techniques to estimate the labels, they cannot accurately predict the true labels and they also assume that there are several samples from the same speaker in the background data that could not be true in reality. In this paper, the authors make use of Deep Learning (DL) to fill this performance gap given unlabeled background data. To this goal, the authors have proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on deep belief networks and deep neural networks to discriminatively model each target speaker. In order to have more insight into the behavior of DL techniques in both single- and multisession speaker enrollment tasks, some experiments have been carried out in this paper in both scenarios. Experiments on National Institute of Standards and Technology 2014 i-vector challenge show that 46% of this performance gap, in terms of minimum of the decision cost function, is filled by the proposed DL-based system. Furthermore, the score combination of the proposed DL-based system and PLDA with estimated labels covers 79% of this gap.

机译：缺少标记的背景数据使得说话人识别中i向量的余弦和概率线性判别分析（PLDA）评分基线技术之间存在很大的性能差距。尽管存在一些无监督的聚类技术来估计标签，但它们无法准确预测真实标签，而且还假设背景数据中有来自同一说话人的多个样本在现实中可能并非真实。在本文中，作者在没有标签背景数据的情况下利用深度学习（DL）来弥补这一性能差距。为此，作者提出了一种混合动力系统中的冒名顶替者选择算法和通用模型自适应过程，该混合系统基于深度信念网络和深度神经网络来区分每个目标说话者。为了更深入地了解DL技术在单会话和多会话演讲者注册任务中的行为，本文在这两种情况下均进行了一些实验。美国国家标准技术研究院2014年i-vector挑战实验表明，就性能而言，就决策成本函数的最小值而言，46％的差距已由提议的基于DL的系统填补。此外，所提出的基于DL的系统和PLDA与估计标签的得分组合弥补了这一差距的79％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第4期|807-817|共11页
作者
Omid Ghahabi; Javier Hernando;
展开▼
作者单位

TALP Research Center, Department of Signal Theory and Communications, Universitat Politecnica de Catalunya—BarcelonaTech, Barcelona, Spain;

TALP Research Center, Department of Signal Theory and Communications, Universitat Politecnica de Catalunya—BarcelonaTech, Barcelona, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker recognition; Speech; Adaptation models; Training; Machine learning; NIST; Speech processing;

机译：说话人识别;语音;适应模型;培训;机器学习;NIST;语音处理;

相似文献

外文文献
中文文献
专利

1. Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space [J] . Yong FENG, Qingyu XIONG, Weiren SHI IEICE transactions on information and systems . 2017,第1期

机译：I向量空间中用于说话人验证的深度非线性度量学习
2. Text-independent speaker recognition based on adaptive course learning loss and deep residual network [J] . Zhong Qinghua, Dai Ruining, Zhang Han, EURASIP journal on advances in signal processing . 2021,第a期

机译：基于自适应课程学习损失和深度剩余网络的文本独立扬声器识别
3. An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning [J] . Singh Youddha Beer, Goel Shivani Multimedia Tools and Applications . 2021,第9期

机译：一种高效算法，用于使用深度学习识别扬声器和语言独立演讲的情绪
4. Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition [C] . Shuai Wang, Zili Huang, Yanmin Qian, International Symposium on Chinese Spoken Language Processing . 2018

机译：基于i向量的鲁棒说话人识别的深度判别分析
5. Multimodal Sensing and Data Processing for Speaker and Emotion Recognition Using Deep Learning Models with Audio, Video and Biomedical Sensors [D] . Abtahi, Farnaz. 2018

机译：使用具有音频，视频和生物医学传感器的深度学习模型，对说话人和情感识别进行多模式传感和数据处理
6. Robust Single-Sample Face Recognition by Sparsity-Driven Sub-Dictionary Learning Using Deep Features [O] . Vittorio Cuculo, Alessandro D’Amelio, Giuliano Grossi, 2019

机译：通过深度特征稀疏驱动的子字典学习进行稳健的单样本人脸识别
7. Deep learning backend for single and multisession i-vector speaker recognition [O] . Ghahabi, Omid, Hernando Pericás, Francisco Javier 2017

机译：深度学习单个和多会话i-vector说话人识别的后端
8. Noise Robust I-Vector Extractor Using Vector Taylor Series For Speaker Recognition. [R] . Lei, Y., Burget, L., Scheffer, N. 2013

机译：使用矢量泰勒级数进行说话人识别的噪声鲁棒I-向量提取器。

Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅