首页> 外文学位 >Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams
【24h】

Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

机译:非中性环境中的鲁棒扬声器建模及其在大规模多扬声器音频流中的应用

获取原文
获取原文并翻译 | 示例

摘要

With an explosive increase in the amount of multimedia content available worldwide and through the web, automatically detecting who spoke when in an audio stream is an important technique that has many practical applications. The task of automatically annotating speech segments with speaker labels could be considered as either a speaker recognition or speaker diarization problem depending on whether the voice samples of the speakers are available as a priori knowledge. Despite the differences, the success of both speaker recognition and speaker diarization hinge on accurate and robust modeling of speaker voice characteristics. Over the past several decades, the technology of statistical speaker modeling has achieved significant advancements. However, the applications of speaker modeling technology in real world by means of speaker recognition and speaker diarization has considerably limited performance. In this dissertation, we investigate the applications of speaker recognition and speaker diarization on The National Aeronautics and Space Administration (NASA) Apollo-11 mission audio corpus to advance their performance in practical applications. In the first part of this dissertation, we focus on understanding the problems and challenges of applying speaker recognition techniques on a subset of the Apollo-11 space-to-ground audio corpus to automatically recognize all three astronauts. Specifically, we investigate the variations of astronauts voices characteristics across different phases of the lunar mission and their impact on speaker recognition performance. In the second part of this dissertation, we focus on the development of robust speaker recognition and diarization algorithms. We illustrate the challenge of applying speaker diarization techniques on multi-speaker naturalistic audio streams such as Apollo-11 mission control center (MCC) audio corpus, and propose active learning based algorithms to effectively incorporate limited human effort in the current speaker diarization process. Moreover, we propose several robust speaker modeling techniques that improve speaker recognition in generally mismatched or noisy environments. Lastly, the application of speaker recognition and speaker diarization for conversation analysis on the Apollo-11 MCC audio corpus is discussed. This dissertation therefore advances speech and language technology to address diarization of multi-speaker naturalistic audio streams for real task oriented teams. It is expected that these advancements will contribute significantly for research on human-to-human voice interaction for team oriented tasks in business, social, government, and security applications.
机译:随着全球和通过Web提供的多媒体内容数量的爆炸性增长,自动检测音频流中的讲话者是一项具有许多实际应用的重要技术。根据说话者的语音样本是否可作为先验知识,使用说话者标签自动注释语音片段的任务可被视为说话者识别或说话者歧义问题。尽管存在差异,但说话人识别和说话人差异化的成功取决于说话人语音特征的准确和可靠建模。在过去的几十年中,统计说话人建模技术取得了重大进步。然而,通过说话人识别和说话人二值化的说话人建模技术在现实世界中的应用具有相当有限的性能。在本文中,我们研究了说话人识别和说话人区分在美国国家航空航天局(NASA)Apollo-11任务音频语料库中的应用,以提高其在实际应用中的性能。在本文的第一部分中,我们着重于了解在Apollo-11空对地音频语料库的子集上应用说话人识别技术以自动识别所有三名宇航员的问题和挑战。具体来说,我们调查了在月球飞行任务的各个阶段中宇航员声音特征的变化及其对说话人识别性能的影响。在本文的第二部分,我们着重于鲁棒说话人识别和二值化算法的开发。我们举例说明了在多扬声器自然主义音频流(例如Apollo-11任务控制中心(MCC)音频语料库)上应用说话者区分技术的挑战,并提出了基于主动学习的算法,以有效地将有限的人力纳入当前的说话者区分过程中。此外,我们提出了几种健壮的说话人建模技术,这些技术可以改善在不匹配或嘈杂的环境中说话人的识别能力。最后,讨论了说话人识别和说话人差异化在Apollo-11 MCC音频语料库的会话分析中的应用。因此,本文提出了语音和语言技术,以解决面向实际任务的团队的多扬声器自然音频流的差异化。预计这些进展将为在企业,社会,政府和安全应用程序中面向团队的任务的人对人语音交互的研究做出重要贡献。

著录项

  • 作者

    Yu, Chengzhu.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Electrical engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:54:27

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号