首页> 外文会议>Computer vision systems >Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks
【24h】

Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks

机译:使用贝叶斯网络的视觉和语音理解的多层次集成

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integrates various levels of processing by using multiple representations of the visually observed scene. They are vertically connected by Bayesian networks in order to find the most plausible interpretation of the scene.rnThe interpretation of a spoken utterance naming an object in the visually observed scene is modeled as another partial representation of the scene. Using this concept, the key problem is the identification of the verbally specified object instances in the visually observed scene. Therefore, a Bayesian network is generated dynamically from the spoken utterance and the visual scene representation. In this network spatial knowledge as well as knowledge extracted from psycholinguistic experiments is coded. First results show the robustness of our approach.
机译:图像和语音处理的交互是多媒体系统的关键属性。当涉及错误,模糊或不完整的数据时,使用纯定性高级描述推论的经典系统会丢失很多信息。我们提出了一种新的体系结构,该体系结构通过使用视觉观察场景的多种表示形式来集成各种级别的处理。它们之间通过贝叶斯网络垂直连接,以找到对场景的最合理解释。在视觉观察的场景中命名对象的口头话语的解释被建模为场景的另一部分表示。使用此概念,关键问题是在视觉观察到的场景中识别口头指定的对象实例。因此,贝叶斯网络是根据语音和视觉场景表示动态生成的。在该网络中,对空间知识以及从心理语言实验中提取的知识进行编码。初步结果表明了我们方法的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号