首页> 外文学位 >Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

【24h】

Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

机译：非中性环境中的鲁棒扬声器建模及其在大规模多扬声器音频流中的应用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With an explosive increase in the amount of multimedia content available worldwide and through the web, automatically detecting who spoke when in an audio stream is an important technique that has many practical applications. The task of automatically annotating speech segments with speaker labels could be considered as either a speaker recognition or speaker diarization problem depending on whether the voice samples of the speakers are available as a priori knowledge. Despite the differences, the success of both speaker recognition and speaker diarization hinge on accurate and robust modeling of speaker voice characteristics. Over the past several decades, the technology of statistical speaker modeling has achieved significant advancements. However, the applications of speaker modeling technology in real world by means of speaker recognition and speaker diarization has considerably limited performance. In this dissertation, we investigate the applications of speaker recognition and speaker diarization on The National Aeronautics and Space Administration (NASA) Apollo-11 mission audio corpus to advance their performance in practical applications. In the first part of this dissertation, we focus on understanding the problems and challenges of applying speaker recognition techniques on a subset of the Apollo-11 space-to-ground audio corpus to automatically recognize all three astronauts. Specifically, we investigate the variations of astronauts voices characteristics across different phases of the lunar mission and their impact on speaker recognition performance. In the second part of this dissertation, we focus on the development of robust speaker recognition and diarization algorithms. We illustrate the challenge of applying speaker diarization techniques on multi-speaker naturalistic audio streams such as Apollo-11 mission control center (MCC) audio corpus, and propose active learning based algorithms to effectively incorporate limited human effort in the current speaker diarization process. Moreover, we propose several robust speaker modeling techniques that improve speaker recognition in generally mismatched or noisy environments. Lastly, the application of speaker recognition and speaker diarization for conversation analysis on the Apollo-11 MCC audio corpus is discussed. This dissertation therefore advances speech and language technology to address diarization of multi-speaker naturalistic audio streams for real task oriented teams. It is expected that these advancements will contribute significantly for research on human-to-human voice interaction for team oriented tasks in business, social, government, and security applications.

机译：随着全球和通过Web提供的多媒体内容数量的爆炸性增长，自动检测音频流中的讲话者是一项具有许多实际应用的重要技术。根据说话者的语音样本是否可作为先验知识，使用说话者标签自动注释语音片段的任务可被视为说话者识别或说话者歧义问题。尽管存在差异，但说话人识别和说话人差异化的成功取决于说话人语音特征的准确和可靠建模。在过去的几十年中，统计说话人建模技术取得了重大进步。然而，通过说话人识别和说话人二值化的说话人建模技术在现实世界中的应用具有相当有限的性能。在本文中，我们研究了说话人识别和说话人区分在美国国家航空航天局（NASA）Apollo-11任务音频语料库中的应用，以提高其在实际应用中的性能。在本文的第一部分中，我们着重于了解在Apollo-11空对地音频语料库的子集上应用说话人识别技术以自动识别所有三名宇航员的问题和挑战。具体来说，我们调查了在月球飞行任务的各个阶段中宇航员声音特征的变化及其对说话人识别性能的影响。在本文的第二部分，我们着重于鲁棒说话人识别和二值化算法的开发。我们举例说明了在多扬声器自然主义音频流（例如Apollo-11任务控制中心（MCC）音频语料库）上应用说话者区分技术的挑战，并提出了基于主动学习的算法，以有效地将有限的人力纳入当前的说话者区分过程中。此外，我们提出了几种健壮的说话人建模技术，这些技术可以改善在不匹配或嘈杂的环境中说话人的识别能力。最后，讨论了说话人识别和说话人差异化在Apollo-11 MCC音频语料库的会话分析中的应用。因此，本文提出了语音和语言技术，以解决面向实际任务的团队的多扬声器自然音频流的差异化。预计这些进展将为在企业，社会，政府和安全应用程序中面向团队的任务的人对人语音交互的研究做出重要贡献。

著录项

作者
Yu, Chengzhu.;
展开▼
作者单位

The University of Texas at Dallas.;

展开▼
授予单位 The University of Texas at Dallas.;
学科 Electrical engineering.
学位 Ph.D.
年度 2017
页码 146 p.
总页数 146
原文格式 PDF
正文语种 eng
中图分类康复医学;
关键词
入库时间 2022-08-17 11:54:27

相似文献

外文文献
中文文献
专利

1. Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking [J] . Naqvi S.M., Wang W., Khan M.S., Signal Processing, IET . 2012,第5期

机译：利用多扬声器跟踪，强大的波束形成和时频掩蔽的多模式（视听）源分离
2. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments [J] . P. Krishnamoorthy, S. R. Mahadeva Prasanna Sadhana . 2009,第5期

机译：时空和频谱组合处理方法在嘈杂，混响或多说话者环境下的说话人识别中的应用
3. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments [J] . P. KRISHNAMOORTHY, S. R. MAHADEVA PRASANNA Sadhana: Academy Proceedings in Engineering Science . 2009,第5期

机译：时空和频谱组合处理方法在嘈杂，混响或多说话者环境下的说话人识别中的应用
4. Robust Speaker Diarization in a Multi-Speaker Environment Using Autocorrelation-based Noise Subtraction [C] . S. M. Mirrezaie, S. M. Ahadi, A. Kashi International Symposium on Signal Processing and Information Technology . 2007

机译：使用基于自相关的噪声减法的多扬声器环境中强大的扬声器日复速度
5. Modeling multi-speaker conversations. [D] . Ji, Gang. 2009

机译：建模多人对话。
6. The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment [O] . Keren Shavit-Cohen, Elana Zion Golumbic 2019

机译：自然多说话者虚拟环境中并发语音中注意转移的动力学
7. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments [O] . P. Krishnamoorthy, S. R. Mahadeva Prasanna 2009

机译：噪声，混响或多扬声器环境下扬声器识别组合时间和光谱处理方法的应用
8. Speaker Indexing in Large Audio Databases Using Anchor Models [R] . Sturim, D. E., Reynolds, D. A., Singer, E., 2001

机译：使用锚模型在大型音频数据库中进行扬声器索引

Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams

摘要

著录项

相似文献

相关主题

期刊订阅