首页> 外文学位 >Nonparametric modeling for speaker recognition.
【24h】

Nonparametric modeling for speaker recognition.

机译:用于说话人识别的非参数建模。

获取原文
获取原文并翻译 | 示例

摘要

Recognition of people based on their voices is an innate property of all humans. It is a natural phenomenon which almost happens without any effort. However, recognition of people by a machine (computer)---known as speaker recognition---is a challenging task. In this dissertation research, a novel approach to speaker recognition, based on nonparametric probability density function estimation is developed. Speaker recognition has a wide variety of applications ranging from creating user friendly interactive voice response systems to detection of criminal and terrorist activities over the telephone. Characterizing a person's voice and creating a representation is known as speaker modeling, and this challenging task is the central theme of this research. Firstly, speaker modeling is identified as a process of estimating the long-term statistical properties, which is in turn obtained by estimating the probability density function of characteristic features (which preserve speaker information) extracted from speech data. Secondly, it is observed that the conventional methods to develop speaker models are based on inappropriate (or even incorrect) assumptions on the density functions of features. A class of statistical models, known as nonparametric models, as against to the traditional parametric models, are introduced to model speakers. The use of such models make no a priori assumptions on the underlying distributions of features, and thus determine the speaker model rather than enforcing a functional form. Novel strategies to perform speaker recognition tasks, such as speaker identification and speaker verification were developed. Experimental evaluation of nonparametric methods were performed under various real-life scenarios, and the nonparametric system was found to have higher performance as well as have less sensitivity to noise, as compared to the parametric system. Methods to analyze conversational speech data, which consists of voice data from one or more speakers are also introduced. It was found that the analysis of conversational data becomes extremely difficult when the speech is recorded over a telephone, where very limited amount of data is available. Simple algorithms to perform speaker clustering and speaker counting were developed. Evaluation of these methods were performed and results indicate that the proposed method has the potential to perform well in real-world conditions.
机译:根据他们的声音识别人是所有人的天生财产。这是自然现象,几乎无需任何努力即可发生。但是,通过机器(计算机)识别人-被称为说话者识别-是一项艰巨的任务。本文研究了一种基于非参数概率密度函数估计的说话人识别新方法。说话人识别具有广泛的应用范围,从创建用户友好的交互式语音响应系统到通过电话检测犯罪和恐怖活动。表征人的声音并创建表示形式被称为说话者建模,而这项具有挑战性的任务是本研究的主题。首先,说话人建模被识别为估计长期统计特性的过程,而该过程又是通过估计从语音数据中提取的特征特征(保留说话者信息)的概率密度函数而获得的。其次,可以发现,开发说话者模型的常规方法是基于对特征的密度函数的不适当(甚至不正确)的假设。与传统参数模型相对的一类统计模型(称为非参数模型)被引入到模型说话者中。此类模型的使用无需对特征的基础分布进行先验假设,因此可以确定说话人模型,而不是强制执行功能形式。开发了执行说话者识别任务的新策略,例如说话者识别和说话者验证。在各种实际情况下对非参数方法进行了实验评估,与参数系统相比,非参数系统具有更高的性能以及对噪声的敏感性更低。还介绍了分析会话语音数据的方法,该数据由来自一个或多个扬声器的语音数据组成。已经发现,当通过电话记录语音时,对话数据的分析变得极为困难,而电话中的数据量非常有限。开发了执行说话人聚类和说话人计数的简单算法。对这些方法进行了评估,结果表明所提出的方法具有在现实条件下良好运行的潜力。

著录项

  • 作者

    Iyer, Ananth N.;

  • 作者单位

    Temple University.;

  • 授予单位 Temple University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术 ;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号