首页> 外文期刊>IEEE Robotics and Automation Letters >A Multimodal Target-Source Classifier With Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
【24h】

A Multimodal Target-Source Classifier With Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

机译:具有注意分支的多模式目标源分类器,以了解获取日常对象的模糊指令

获取原文
获取原文并翻译 | 示例
       

摘要

In this study, we focus on multimodal language understanding for fetching instructions in the domestic service robots context. This task consists of predicting a target object, as instructed by the user, given an image and an unstructured sentence, such as "Bring me the yellow box (from the wooden cabinet)." This is challenging because of the ambiguity of natural language, i.e., the relevant information may be missing or there might be several candidates. To solve such a task, we propose the multimodal target-source classifier model with attention branches (MTCM-AB), which is an extension of the MTCM. Our methodology uses the attention branch network (ABN) to develop a multimodal attention mechanism based on linguistic and visual inputs. Experimental validation using a standard dataset showed that the MTCM-AB outperformed both state-of-the-art methods and the MTCM. In particular, the MTCM-AB accuracy was 90.1 on average while human performance was 90.3 on the PFN-PIC dataset.
机译:在本研究中,我们专注于多峰语言理解,以便在国内服务机器人上下文中获取指令。这项任务包括预测目标对象,如用户指导,给定图像和一个非结构化的句子,例如“从木内阁带来黄色盒子”。“这是由于自然语言的模糊性而具有挑战性,即,相关信息可能丢失或可能有几个候选人。要解决此类任务,我们提出了具有注意分支(MTCM-AB)的多模式目标源分类器模型,该分支(MTCM-AB)是MTCM的扩展。我们的方法使用注意分支网络(ABN)基于语言和视觉输入来开发多模级注意机制。使用标准数据集的实验验证显示MTCM-AB的表现优于最先进的方法和MTCM。特别地,MTCM-AB精度平均为90.1,而PFN-PIC数据集的人类性能为90.3。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号