首页> 外文学位 >Multimodal fusion with applications to audio-visual speech recognition.

【24h】

Multimodal fusion with applications to audio-visual speech recognition.

机译：多模式融合及其在视听语音识别中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This study considers the fundamental problem of multimodal fusion in the context of pattern recognition tasks in human-computer interfaces (HCI). Specifically, the research stems from two basic recognition problems: first, automatic speech recognition; and second, biometrics, i.e., person recognition. In both cases, multiple cues carried in different modalities are often available for the recognition targets. Thus, the multiple information sources may be modeled or evaluated jointly to improve the recognition performance, especially under adverse ambient conditions. This motivation respectively leads to audio-visual speech recognition and multichannel biometrics. A crucial problem that arises in these multimodal approaches is how to carry out fusion to best take advantage of the available information.; Differences in the characteristics of the intermodal couplings in audio-visual speech recognition and in multichannel biometrics defy a universal fusion method for both applications. For audio-visual speech modeling, we propose a novel sensory fusion method based on the coupled hidden Markov models (CHMMs). The CHMM framework allows the fusion of two temporally coupled information sources to take place as an integral part of the statistical modeling process. An important advantage of the CHMM-based fusion method lies in its ability to model asynchronies between the audio and visual channels. We describe two approaches to carry out inference and learning in CHMMs. The first is an exact algorithm derived by extending the forward-backward procedure used in hidden Markov model (HMM) inference. The second method relies on the model transformation strategy that maps the state space of a CHMM onto the state space of a classic HMM, and therefore facilitates the development of sophisticated audio-visual speech recognition systems using existing infrastructures. For multichannel biometrics, we introduce a general formulation based on the late integration paradigm and address the environmental robustness issue through multichannel fusion. Based on this formulation, two effective approaches to carry out environment-adaptive decision fusion are developed: the environmental confidence weighting method and the optimal channel weighting method.

机译：这项研究考虑了人机界面（HCI）模式识别任务中多模式融合的根本问题。具体来说，该研究源于两个基本的 recognition 问题：第一，自动语音识别；第二，自动语音识别。其次是生物识别，即人的识别。在这两种情况下，识别目标通常可以使用以不同方式携带的多个提示。因此，可以共同对多个信息源进行建模或评估，以提高识别性能，尤其是在不利的环境条件下。这种动机分别导致视听语音识别和多通道生物识别。这些多模式方法中出现的一个关键问题是如何进行融合以最好地利用可用信息。视听语音识别和多通道生物特征识别中联运耦合特性的差异，无法为这两种应用提供通用的融合方法。对于视听语音建模，我们提出了一种基于耦合隐马尔可夫模型（CHMM）的新颖感觉融合方法。 CHMM框架允许将两个时间耦合的信息源进行融合，作为统计建模过程的组成部分。基于CHMM的融合方法的一个重要优点在于它能够对音频和视频通道之间的异步进行建模。我们描述了两种在CHMM中进行推理和学习的方法。第一种是通过扩展隐马尔可夫模型（HMM）推理中使用的向前-向后过程得出的精确算法。第二种方法依赖于模型转换策略，该模型转换策略将CHMM的状态空间映射到经典HMM的状态空间，因此有助于使用现有基础结构开发复杂的视听语音识别系统。对于多通道生物识别，我们介绍了基于后期集成范例的通用公式，并通过多通道融合解决了环境稳健性问题。在此基础上，提出了两种有效的环境自适应决策融合方法：环境置信度加权方法和最佳渠道加权方法。

著录项

作者
Chu, Stephen Mingyu.;
展开▼
作者单位

University of Illinois at Urbana-Champaign.;

展开▼
授予单位 University of Illinois at Urbana-Champaign.;
学科 Engineering Electronics and Electrical.; Computer Science.
学位 Ph.D.
年度 2003
页码 87 p.
总页数 87
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network [J] . Tan Ke, Xu Yong, Zhang Shi-Xiong, Selected Topics in Signal Processing, IEEE Journal of . 2020,第3期

机译：具有两级多模式网络的视听语音分离和DeReveration
2. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
3. An audio-visual corpus for multimodal automatic speech recognition [J] . Czyzewski Andrzej, Kostek Bozena, Bratoszewski Piotr, Journal of Intelligent Information Systems . 2017,第2期

机译：用于多模式自动语音识别的视听语料库
4. Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition [C] . Shivappa, S.T., Rao, Personal, Indoor and Mobile Radio Communications,2005 IEEE 16th International Symposium on . 2008

机译：迭代解码的多峰信息融合及其在视听语音识别中的应用
5. A multimodal sensor fusion architecture for audio-visual speech recognition. [D] . Makkook, Mustapha A. 2007

机译：用于视听语音识别的多模式传感器融合体系结构。
6. Multimodality Image Fusion Guided Procedures: Technique Accuracy and Applications [O] . Nadine Abi-Jaoudeh, Jochen Kruecker, Samuel Kadoury, -1

机译：多模图像融合引导的过程：技术准确性和应用
7. Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition [O] . Bhaskar D. Rao, Mohan M. Trivedi 2008

机译：迭代解码算法的多峰信息融合及其在视听语音识别中的应用

Multimodal fusion with applications to audio-visual speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅