Robust speaker recognition based on latent variable models.

机译：基于潜在变量模型的可靠说话人识别。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic speaker recognition in uncontrolled environments is a very challenging task due to channel distortions, additive noise and reverberation. To address these issues, this thesis studies probabilistic latent variable models of short-term spectral information that leverage large amounts of data to achieve robustness in challenging conditions.;Current speaker recognition systems represent an entire speech utterance as a single point in a high-dimensional space. This representation is known as "supervector". This thesis starts by analyzing the properties of this representation. A novel visualization procedure of supervectors is presented by which qualitative insight about the information being captured is obtained. We then propose the use of an overcomplete dictionary to explicitly decompose a supervector into a speaker-specific component and an undesired variability component. An algorithm to learn the dictionary from a large collection of data is discussed and analyzed. A subset of the entries of the dictionary is learned to represent speaker-specific information and another subset to represent distortions. After encoding the supervector as a linear combination of the dictionary entries, the undesired variability is removed by discarding the contribution of the distortion components. This paradigm is closely related to the previously proposed paradigm of Joint Factor Analysis modeling of supervectors. We establish a connection between the two approaches and show how our proposed method provides improvements in terms of computation and recognition accuracy.;An alternative way to handle undesired variability in supervector representations is to first project them into a lower dimensional space and then to model them in the reduced subspace. This low-dimensional projection is known as "i-vector". Unfortunately, i-vectors exhibit non-Gaussian behavior, and direct statistical modeling requires the use of heavy-tailed distributions for optimal performance. These approaches lack closed-form solutions, and therefore are hard to analyze. Moreover, they do not scale well to large datasets. Instead of directly modeling i-vectors, we propose to first apply a non-linear transformation and then use a linear-Gaussian model. We present two alternative transformations and show experimentally that the transformed i-vectors can be optimally modeled by a simple linear-Gaussian model (factor analysis). We evaluate our method on a benchmark dataset with a large amount of channel variability and show that the results compare favorably against the competitors. Also, our approach has closed-form solutions and scales gracefully to large datasets.;Finally, a multi-classifier architecture trained on a multicondition fashion is proposed to address the problem of speaker recognition in the presence of additive noise. A large number of experiments are conducted to analyze the proposed architecture and to obtain guidelines for optimal performance in noisy environments. Overall, it is shown that multicondition training of multi-classifier architectures not only produces great robustness in the anticipated conditions, but also generalizes well to unseen conditions.

机译：由于通道失真，附加噪声和混响，在不受控制的环境中自动识别说话者是一项非常具有挑战性的任务。为了解决这些问题，本论文研究了短期频谱信息的概率潜在变量模型，该模型利用大量数据在挑战性条件下实现鲁棒性。当前的说话人识别系统将整个语音发声表示为高维中的单个点空间。这种表示称为“超向量”。本文首先分析了这种表示的性质。提出了一种新颖的超向量可视化过程，通过该过程可以获得有关所捕获信息的定性见解。然后，我们建议使用超完备字典将超向量显式分解为特定于说话者的分量和不期望的可变性分量。讨论和分析了从大量数据中学习字典的算法。学习字典条目的一个子集表示特定于说话者的信息，另一个子集表示失真。在将超向量编码为字典条目的线性组合之后，通过丢弃失真分量的贡献来消除不需要的可变性。该范例与先前提出的超向量联合因子分析建模范例密切相关。我们在这两种方法之间建立了联系，并说明了我们提出的方法如何在计算和识别准确性方面进行改进;;处理超向量表示中不希望的可变性的另一种方法是首先将它们投影到较低维的空间中，然后对其进行建模在精简子空间中。这种低维投影被称为“ i向量”。不幸的是，i向量表现出非高斯行为，并且直接统计建模需要使用重尾分布才能获得最佳性能。这些方法缺乏封闭形式的解决方案，因此很难分析。而且，它们不能很好地扩展到大型数据集。我们建议先应用非线性变换，然后再使用线性高斯模型，而不是直接对i向量进行建模。我们提出了两个替代变换，并通过实验证明了可以通过简单的线性高斯模型（因子分析）来最佳地建模变换后的i向量。我们在具有大量渠道可变性的基准数据集上评估了我们的方法，并表明结果与竞争对手相比具有优势。同样，我们的方法具有封闭形式的解决方案，并且可以适当地扩展到大型数据集。最后，提出了一种以多条件方式训练的多分类器体系结构，以解决在存在附加噪声的情况下说话人识别的问题。进行了大量实验以分析所提出的体系结构并获得在嘈杂环境中最佳性能的指导原则。总的来说，它表明，多分类器体系结构的多条件训练不仅在预期条件下产生了很好的鲁棒性，而且还很好地推广到了看不见的条件。

著录项

作者
Garcia-Romero, Daniel.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2012
页码 154 p.
总页数 154
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM [J] . Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa Speech Communication . 2007,第6期

机译：通过结合特定于说话人的GMM和适用于说话人的HMM，基于位置相关的CMN进行鲁棒的远方说话人识别
2. Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model [J] . Ansari Zohreh, Almasganj Farshad, Kabudian Seyed Jahanshah Circuits, systems and signal processing . 2021,第8期

机译：基于KPCA和潜变模型的组合快速扬声器适应
3. Robustness Speaker Recognition Based on Feature Space in Clean and Noisy Condition [J] . Khamis A. Al-Karawi International Journal of Sensors, Wireless Communication and Control . 2019,第4期

机译：基于清洁和嘈杂条件的特征空间的鲁棒性扬声器识别
4. LOCALITY-PRESERVING COMPLEX-VALUED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR ROBUST FACE RECOGNITION [C] . Sih-Huei Chen, Yuan-Shan Lee, Yu-Sheng Hsu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：稳健面部识别的局部保留复数高斯过程潜变模模型
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. Clustering South African Households Based on their Asset Status Using Latent Variable Models. [O] . Damien Mcparl, Isobel Claire Gormley, Tyler H, 2014

机译：利用潜变量模型基于资产状况聚类南非家庭。
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Robust speaker recognition based on latent variable models.

摘要

著录项

相似文献

相关主题

期刊订阅