首页> 外文学位 >Multimodal fusion with applications to audio-visual speech recognition.
【24h】

Multimodal fusion with applications to audio-visual speech recognition.

机译:多模式融合及其在视听语音识别中的应用。

获取原文
获取原文并翻译 | 示例

摘要

This study considers the fundamental problem of multimodal fusion in the context of pattern recognition tasks in human-computer interfaces (HCI). Specifically, the research stems from two basic recognition problems: first, automatic speech recognition; and second, biometrics, i.e., person recognition. In both cases, multiple cues carried in different modalities are often available for the recognition targets. Thus, the multiple information sources may be modeled or evaluated jointly to improve the recognition performance, especially under adverse ambient conditions. This motivation respectively leads to audio-visual speech recognition and multichannel biometrics. A crucial problem that arises in these multimodal approaches is how to carry out fusion to best take advantage of the available information.; Differences in the characteristics of the intermodal couplings in audio-visual speech recognition and in multichannel biometrics defy a universal fusion method for both applications. For audio-visual speech modeling, we propose a novel sensory fusion method based on the coupled hidden Markov models (CHMMs). The CHMM framework allows the fusion of two temporally coupled information sources to take place as an integral part of the statistical modeling process. An important advantage of the CHMM-based fusion method lies in its ability to model asynchronies between the audio and visual channels. We describe two approaches to carry out inference and learning in CHMMs. The first is an exact algorithm derived by extending the forward-backward procedure used in hidden Markov model (HMM) inference. The second method relies on the model transformation strategy that maps the state space of a CHMM onto the state space of a classic HMM, and therefore facilitates the development of sophisticated audio-visual speech recognition systems using existing infrastructures. For multichannel biometrics, we introduce a general formulation based on the late integration paradigm and address the environmental robustness issue through multichannel fusion. Based on this formulation, two effective approaches to carry out environment-adaptive decision fusion are developed: the environmental confidence weighting method and the optimal channel weighting method.
机译:这项研究考虑了人机界面(HCI)模式识别任务中多模式融合的根本问题。具体来说,该研究源于两个基本的 recognition 问题:第一,自动语音识别;第二,自动语音识别。其次是生物识别,即人的识别。在这两种情况下,识别目标通常可以使用以不同方式携带的多个提示。因此,可以共同对多个信息源进行建模或评估,以提高识别性能,尤其是在不利的环境条件下。这种动机分别导致视听语音识别和多通道生物识别。这些多模式方法中出现的一个关键问题是如何进行融合以最好地利用可用信息。视听语音识别和多通道生物特征识别中联运耦合特性的差异,无法为这两种应用提供通用的融合方法。对于视听语音建模,我们提出了一种基于耦合隐马尔可夫模型(CHMM)的新颖感觉融合方法。 CHMM框架允许将两个时间耦合的信息源进行融合,作为统计建模过程的组成部分。基于CHMM的融合方法的一个重要优点在于它能够对音频和视频通道之间的异步进行建模。我们描述了两种在CHMM中进行推理和学习的方法。第一种是通过扩展隐马尔可夫模型(HMM)推理中使用的向前-向后过程得出的精确算法。第二种方法依赖于模型转换策略,该模型转换策略将CHMM的状态空间映射到经典HMM的状态空间,因此有助于使用现有基础结构开发复杂的视听语音识别系统。对于多通道生物识别,我们介绍了基于后期集成范例的通用公式,并通过多通道融合解决了环境稳健性问题。在此基础上,提出了两种有效的环境自适应决策融合方法:环境置信度加权方法和最佳渠道加权方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号