An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech

机译：适用于麦克风和电话语音的说话人识别的i向量提取器

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is widely believed that speaker verification systems perform better when there is sufficient background training data to deal with nuisance effects of transmission channels. It is also known that these systems perform at their best when the sound environment of the training data is similar to that of the context of use (test context). For some applications however, training data from the same type of sound environment is scarce, whereas a considerable amount of data from a different type of environment is available. In this paper, we propose a new architecture for text-independent speaker verification systems that are satisfactorily trained by virtue of a limited amount of application-specific data, supplemented with a sufficient amount of training data from some other context.rnThis architecture is based on the extraction of parameters (i-vectors) from a low-dimensional space (total variability space) proposed by Dehak [1]. Our aim is to extend Dehak's work to speaker recognition on sparse data, namely microphone speech. The main challenge is to overcome the fact that insufficient application-specific data is available to accurately estimate the total variability covariance matrix. We propose a method based on Joint Factor Analysis (JFA) to estimate microphone eigenchan-nels (sparse data) with telephone eigenchannels (sufficient data).rnFor classification, we experimented with the following two approaches: Support Vector Machines (SVM) and Cosine Distance Scoring (CDS) classifier, based on cosine distances. We present recognition results for the part of female voices in the interview data of the NIST 2008 SRE. The best performance is obtained when our system is fused with the state-of-the-art JFA. We achieve 13% relative improvement on equal error rate and the minimum value of detection cost function decreases from 0.0219 to 0.0164.

机译：人们普遍认为，当有足够的背景训练数据来处理传输通道的有害影响时，说话者验证系统的性能会更好。还已知的是，当训练数据的声音环境与使用环境（测试环境）相似时，这些系统将发挥最佳性能。但是，对于某些应用程序，来自相同类型声音环境的训练数据很少，而来自不同类型环境的大量数据可用。在本文中，我们提出了一种新的体系结构，用于独立于文本的说话者验证系统，该体系结构可通过有限的特定于应用程序的数据进行令人满意的培训，并补充了来自其他上下文的足够数量的培训数据。 Dehak提出的从低维空间（总可变性空间）中提取参数（i矢量）。我们的目标是将Dehak的工作扩展到对稀疏数据（即麦克风语音）的说话者识别。主要挑战是克服以下事实：缺乏足够的特定于应用程序的数据来准确估计总变异性协方差矩阵。我们提出了一种基于联合因子分析（JFA）的方法来估计具有电话特征信道（足够数据）的麦克风特征信道（稀疏数据）。对于分类，我们尝试了以下两种方法：支持向量机（SVM）和余弦距离评分（CDS）分类器，基于余弦距离。我们在NIST 2008 SRE的访谈数据中提供了对女性声音部分的识别结果。当我们的系统与最新的JFA融合在一起时，可以获得最佳性能。我们在均等错误率上实现了13％的相对改进，检测成本函数的最小值从0.0219降低至0.0164。

著录项

来源
《Odyssey 2010: the speaker and language recognition workshop》|2010年|p.28-33|共6页
会议地点 Brno(CS)
作者
Mohammed Senoussaoui; Patrick Kenny; Najim Dehak; Pierre Dumouchel;
展开▼
作者单位

École de Technologie Superieur (ÉTS) Canada Centre de Recherche Informatique de Montréal (CRIM) Canada;

Centre de Recherche Informatique de Montréal (CRIM) Canada;

Spoken language system, CSAIL -MIT, Cambridge USA;

École de Technologie Superieur (ÉTS) Canada Centre de Recherche Informatique de Montréal (CRIM) Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类语音信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation [J] . Md Jahangir Alam, Vishwa Gupta, Patrick Kenny, EURASIP journal on advances in signal processing . 2015,第1期

机译：采用多特征提取器和i-vector说话者自适应功能的混响和嘈杂环境中的语音识别
2. Speaker Recognition from Emotional Speech Using I-vector Approach [J] . MACKOVá Lenka, I?MáR Anton Journal of Electrical and Electronics Engineering . 2014,第1期

机译：使用I-vector方法从情感语音中识别说话人
3. Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources [J] . McLaren M., van Leeuwen D. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第3期

机译：源归一化LDA，用于使用来自多个语音源的i矢量进行鲁棒的说话人识别
4. Investigation of Segmentation in i-Vector Based Speaker Diarization of Telephone Speech [C] . Zbynek Zajic, Marie Kunesova, Vlasta Radova International Conference on speech and computer . 2016

机译：基于i-Vector的电话语音说话者分割中的分割研究
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Effects of Wireless Remote Microphone on Speech Recognition in Noise for Hearing Aid Users in China [O] . Jing Chen, Zhe Wang, Ruijuan Dong, 2021

机译：无线远程麦克风对中国助听器用户噪声语音识别的影响
7. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation [O] . Md Jahangir Alam, Vishwa Gupta, Patrick Kenny, 2015

机译：使用多特征提取器和i-vector说话者自适应功能的混响和嘈杂环境中的语音识别
8. Noise Robust I-Vector Extractor Using Vector Taylor Series For Speaker Recognition. [R] . Lei, Y., Burget, L., Scheffer, N. 2013

机译：使用矢量泰勒级数进行说话人识别的噪声鲁棒I-向量提取器。

An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech

摘要

著录项

相似文献

相关主题

期刊订阅