首页> 外文会议>International Workshop on Machine Learning for Multimodal Interaction >Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers
【24h】

Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers

机译:对象类别识别使用语音和图像分类器的概率融合

获取原文

摘要

Multimodal scene understanding is an integral part of human-robot interaction (HRI) in situated environments. Especially useful is category-level recognition, where the the system can recognize classes of objects of scenes rather than specific instances (e.g., any chair vs. this particular chair.) Humans use multiple modalities to understand which object category is being referred to, simultaneously interpreting gesture, speech and visual appearance, and using one modality to disambiguate the information contained in the others. In this paper, we address the problem of fusing visual and acoustic information to predict object categories, when an image of the object and speech input from the user is available to the HRI system. Using probabilistic decision fusion, we show improved classification rates on a dataset containing a wide variety of object categories, compared to using either modality alone.
机译:多式联运场景理解是位于环境环境中的人机交互(HRI)的一个组成部分。特别有用的是类别级别识别,其中系统可以识别场景对象的类而不是特定的实例(例如,任何椅子与此特定椅子。)人类使用多种模态来了解哪个对象类别同时参考哪个对象类别解释手势,语音和视觉外观,并使用一种模态来消除其他方式包含在其他方式中的信息。在本文中,我们解决了融合视觉和声学信息以预测对象类别的问题,当来自用户的对象的图像和来自用户的语音输入时,可以使用来自用户的图像。使用概率决策融合,我们在包含多种对象类别的数据集上显示了改进的分类速率,而单独使用任何一种模态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号