首页> 外文学位 >Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval.
【24h】

Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval.

机译:语音识别中的音频解析和快速的说话人自适应,可用于语音文档检索。

获取原文
获取原文并翻译 | 示例

摘要

The focus of this thesis is to address a number of research issues in developing an effective large vocabulary continuous speech recognition (LVCSR) based on-line spoken document retrieval system. Within this framework, the primary thesis contributions include the following distinct yet related areas:; The first thesis contribution addresses the problem of efficient audio stream parsing. Here, an extension to the previously proposed Bayesian Information Criterion (BIC) based algorithm is formulated as T 2-BIC, by integrating the Hotelling's T2-Statistic into BIC. Using the proposed algorithm, a significant computational speed improvement is demonstrated with superior parsing performance.; Second, novel rapid model adaptation techniques, entitled Eigenspace Mapping, represent a primary contribution from this thesis. The idea of Eigenspace Mapping is to construct discriminative acoustic models for the test speaker by preserving the dominant discriminating power from the baseline model along the test speaker's first primary eigendirections. The adaptation process is accomplished through a linear transformation in the model space. Based on this key idea, a number of algorithms can be formulated such as EigMap, and extensions using different objective functions including the Structural Maximum Likelihood Eigenspace Mapping. Unsupervised adaptation experiments show that the proposed algorithms are effective using very limited amounts of adaptation data. Furthermore, the proposed algorithms are highly additive to other traditional methods such as MLLR by bringing additional discrimination information.; Finally, the last contribution focuses on an experimental on-line spoken document retrieval system, SpeechFind, which is designed and implemented by incorporatitig state-of-the-art LVCSR and information retrieval (IR) technologies. In addition to system development efforts, contributions have been made to enhance the quality of automatic transcripts and several methods such as query and document expansions have been developed to overcome the issue of IR over corrupted transcripts.; Collectively, the contributions made in these three related areas have resulted in an effective and integrated on-line spoken document retrieval system. Moreover, the proposed audio parsing and novel rapid speaker adaptation algorithms have helped advance the state of the art in robust speech recognition.
机译:本文的重点是在开发有效的基于在线语音文档检索的大词汇量连续语音识别(LVCSR)系统中解决许多研究问题。在此框架内,主要论文贡献包括以下不同但又相关的领域:论文的第一篇论文致力于解决音频流高效解析的问题。在这里,通过整合Hotelling的 T ,将对先前提出的基于贝叶斯信息准则(BIC)的算法的扩展表示为 T 2 -BIC。 2 -统计到BIC。使用提出的算法,具有卓越的解析性能,显示了显着的计算速度改进。其次,名为本征空间映射的新型快速模型自适应技术代表了本论文的主要贡献。本征空间映射的思想是通过沿测试说话者的第一个主要特征方向保留来自基线模型的主要区分能力,从而为测试说话者构建区分性声学模型。适应过程是通过模型空间中的线性变换来完成的。基于此关键思想,可以制定许多算法,例如EigMap和使用不同目标函数的扩展,包括结构最大似然本征空间映射。无监督的自适应实验表明,所提出的算法在使用非常有限的自适应数据时是有效的。此外,通过带来额外的鉴别信息,所提出的算法是对诸如MLLR之类的其他传统方法的高度附加。最后,最后的贡献集中在一个实验性在线语音文档检索系统SpeechFind上,该系统是通过结合最新的LVCSR和信息检索(IR)技术而设计和实现的。除了系统开发工作以外,还为提高自动成绩单的质量做出了贡献,并开发了一些方法,例如查询和文档扩展,以克服损坏的成绩单上的IR问题。总体而言,在这三个相关领域中的贡献已形成了一个有效且集成的在线语音文档检索系统。此外,所提出的音频解析和新颖的快速说话人自适应算法已经帮助提高了鲁棒语音识别的技术水平。

著录项

  • 作者

    Zhou, Bowen.;

  • 作者单位

    University of Colorado at Boulder.;

  • 授予单位 University of Colorado at Boulder.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 167 p.
  • 总页数 167
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号