首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition
【24h】

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

机译:不确定性补偿的自适应多峰融合技术在视听语音识别中的应用

获取原文
获取原文并翻译 | 示例

摘要

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures. We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach. We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.
机译:尽管特征测量的准确性在很大程度上取决于不断变化的环境条件,但迄今为止在模式识别任务中研究这一事实的后果却鲜为人知。在本文中,我们明确考虑了特征测量的不确定性,并展示了应如何调整多峰分类和学习规则以补偿其影响。我们的方法在多模态融合场景(例如视听语音识别)中特别富有成果,在该场景中,多个互补的随时间变化的特征流被集成在一起。对于此类应用程序,只要可以估计每个特征流的测量噪声不确定性,则所提出的框架将导致高度自适应的多峰融合规则,该规则易于实现且高效。我们的技术广泛适用,可以与同步或异步多峰序列集成体系结构透明集成。我们进一步证明,在某些假设下,依靠流权重的多峰融合方法自然可以从我们的方案中浮现出来。这种联系为我们的多模式不确定性补偿方法的适应性特性提供了宝贵的见解。我们展示了如何将这些想法实际应用于视听语音识别。在这种情况下,我们提出了用于具有活动外观模型的独立于人的视觉特征提取和不确定性估计的改进技术,并讨论了如何有效地计算增强的音频特征及其不确定性估计。我们使用同步或异步多模式集成模型在CUAVE数据库上的视听语音识别实验中证明了我们的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号