Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Papandreou G.; Katsamanis A.; Pitsikalis V.; Maragos P.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

【24h】

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

机译：不确定性补偿的自适应多峰融合技术在视听语音识别中的应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures. We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach. We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.

机译：尽管特征测量的准确性在很大程度上取决于不断变化的环境条件，但迄今为止在模式识别任务中研究这一事实的后果却鲜为人知。在本文中，我们明确考虑了特征测量的不确定性，并展示了应如何调整多峰分类和学习规则以补偿其影响。我们的方法在多模态融合场景（例如视听语音识别）中特别富有成果，在该场景中，多个互补的随时间变化的特征流被集成在一起。对于此类应用程序，只要可以估计每个特征流的测量噪声不确定性，则所提出的框架将导致高度自适应的多峰融合规则，该规则易于实现且高效。我们的技术广泛适用，可以与同步或异步多峰序列集成体系结构透明集成。我们进一步证明，在某些假设下，依靠流权重的多峰融合方法自然可以从我们的方案中浮现出来。这种联系为我们的多模式不确定性补偿方法的适应性特性提供了宝贵的见解。我们展示了如何将这些想法实际应用于视听语音识别。在这种情况下，我们提出了用于具有活动外观模型的独立于人的视觉特征提取和不确定性估计的改进技术，并讨论了如何有效地计算增强的音频特征及其不确定性估计。我们使用同步或异步多模式集成模型在CUAVE数据库上的视听语音识别实验中证明了我们的方法的有效性。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2009年第3期|p.423-435|共13页
作者
Papandreou G.; Katsamanis A.; Pitsikalis V.; Maragos P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Active appearance models (AAMs); audiovisual automatic speech recognition (AV-ASR); multimodal fusion; uncertainty compensation;

机译：主动外观模型（AAM）;视听自动语音识别（AV-ASR）;多模式融合;不确定性补偿;

相似文献

外文文献
中文文献
专利

1. A simplified audiovisual fusion model with application to large-vocabulary recognition of French Canadian speech [J] . L. Gagnon, S. Foucher, F. Laliberte, Canadian journal of electrical and computer engineering . 2008,第2期

机译：一种简化的视听融合模型，应用于加拿大法语语音的大词汇量识别
2. Multimodal information fusion application to human emotion recognition from face and speech [J] . Muharram Mansoorizadeh, Nasrollah Moghaddam Charkari Multimedia Tools and Applications . 2010,第2期

机译：多峰信息融合技术在人脸表情识别中的应用
3. Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools [J] . Zhibing Xie, Ling Guan International journal of multimedia data engineering & management . 2013,第4期

机译：基于新型信息理论工具的视听情感识别多模式信息融合
4. MULTIMODAL FUSION BY ADAPTIVE COMPENSATION FOR FEATURE UNCERTAINTY WITH APPLICATION TO AUDIOVISUAL SPEECH RECOGNITION [C] . Athanassios Katsamanis, George Papandreou, Vassilis Pitsikalis, European Signal Processing Conference;EUSIPCO . 2006

机译：自适应补偿的多模态融合技术在音频语音识别中的应用
5. Multimodal fusion with applications to audio-visual speech recognition. [D] . Chu, Stephen Mingyu. 2003

机译：多模式融合及其在视听语音识别中的应用。
6. Lipreading and Audiovisual Speech Recognition across the Adult Lifespan: Implications for Audiovisual Integration [O] . Nancy Tye-Murray, Brent Spehar, Joel Myerson, -1

机译：成人寿命中的唇读和视听语音识别：对视听整合的启示
7. Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition [O] . Papandreou, G, Katsamanis, A, Pitsikalis, V, 2009

机译：不确定性补偿的自适应多模态融合及其在视听语音识别中的应用

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅