首页> 外文学位 >Advancements in robust algorithm formulation for dialect and speaker recognition.
【24h】

Advancements in robust algorithm formulation for dialect and speaker recognition.

机译:用于方言和说话人识别的鲁棒算法公式的进步。

获取原文
获取原文并翻译 | 示例

摘要

The speech signal is comprised of many levels of information in addition to the text content itself, such as speaker information (e.g., dialect/accent, gender, emotion, age, identity) and environment information (e.g., channel, background noise, room conditions). This thesis focuses on the identification of two important factors in the speech signal, which include automatic dialect classification and automatic speaker recognition.;This thesis proposes two novel algorithms to improve dialect classification for text-independent spontaneous speech in both Arabic and Spanish languages, along with probe results for Chinese. The algorithms are formulated using the Kullback-Leibler divergence based mixture selection in the training phase and frame selection decoding in the testing phase under a Gaussian mixture model based framework. The major motivation of both algorithms is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information from the available dialects. In addition, since the difference among the dialects is very subtle, the performance is more sensitive to mismatches from other components in the speech signal. To compensate for mismatch and focus on the intrinsic dialect properties itself, the well-know factor analysis based mismatch compensation approach is used and extended to compensate for the various distortions (e.g., gender, speaker, and channel) in dialect identification so that only dialect information is emphasized, thereby improving overall performance.;The second thesis goal addresses the problem of the speaker recognition, where factor analysis, as one of the most important techniques, is widely used in model training and channel compensation. The correlation between speaker and distortion (e.g., channel and additive noise) is analyzed and modeled. A resulting simplified version of the model is then used to fit the factor analysis approach under a joint factor analysis framework, since factor analysis has been proven to be very effective for performance improvement. Next, in order to avoid the approximation of simplification in the joint factor analysis framework, the total variability model is studied and a new supervised approach is proposed to reserve more speaker specific information than the total variability model which is an unsupervised probabilistic principle components analysis approach. In addition, the combination of the proposed supervised and traditional unsupervised approaches is proposed and evaluated. Evaluations are performed on the NIST SRE-2008.;This thesis has therefore contributed to improved modeling and classification strategies for dialect/accent, as well as speaker recognition, based on leveraging discriminative knowledge which is learned during modeling. Such advancements will ultimately contribute to improve speech processing and language technology solutions.
机译:语音信号除了文本内容本身之外还包含许多级别的信息,例如说话者信息(例如方言/重音,性别,情感,年龄,身份)和环境信息(例如通道,背景噪音,房间状况) )。本文着眼于语音信号中两个重要因素的识别,包括自动方言分类和说话人自动识别。本文提出了两种新颖的算法来改善阿拉伯语和西班牙语独立于文本的自发语音的方言分类,以及附带中文的探测结果。在基于高斯混合模型的框架下,在训练阶段使用基于Kullback-Leibler散度的混合选择以及在测试阶段使用帧选择解码来制定算法。两种算法的主要动机是抑制方言语言空间中的混淆/混乱区域,并强调来自可用方言的区分/敏感信息。此外,由于方言之间的差异非常细微,因此性能对于语音信号中其他成分的不匹配更加敏感。为了补偿不匹配并专注于固有的方言属性本身,使用了基于众所周知的因素分析的不匹配补偿方法,并扩展了该方法以补偿方言识别中的各种失真(例如性别,说话者和声道),从而仅使用方言强调信息,从而提高整体性能。第二个目标是解决说话人识别问题,其中因素分析作为最重要的技术之一,已广泛用于模型训练和信道补偿中。分析和建模扬声器与失真之间的相关性(例如,声道和附加噪声)。然后,将模型的简化版本用于联合因子分析框架下的因子分析方法,因为事实证明因子分析对于提高性能非常有效。接下来,为了避免在联合因子分析框架中近似化,研究了总变异性模型,并提出了一种新的监督方法,该方法比总变异性模型保留更多的说话人特定信息,后者是一种无监督的概率主成分分析方法。 。另外,提出并评估了建议的有监督方法和传统无监督方法的组合。对NIST SRE-2008进行了评估。因此,本论文基于建模过程中获得的判别性知识,为改进方言/重音的建模和分类策略以及说话人识别做出了贡献。这些进步最终将有助于改善语音处理和语言技术解决方案。

著录项

  • 作者

    Lei, Yun.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 164 p.
  • 总页数 164
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:44:31

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号