首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Unsupervised Textual Grounding: Linking Words to Image Concepts
【24h】

Unsupervised Textual Grounding: Linking Words to Image Concepts

机译:无监督的文本基础:将单词链接到图像概念

获取原文

摘要

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale datasets is required, however, constructing such a dataset is time-consuming and expensive. Therefore, we develop a completely unsupervised mechanism for textual grounding using hypothesis testing as a mechanism to link words to detected image concepts. We demonstrate our approach on the ReferIt Game dataset and the Flickr30k data, outperforming baselines by 7.98% and 6.96% respectively.
机译:文本基础,即将单词链接到图像中的对象,对于机器人技术和人机交互是一项具有挑战性但重要的任务。现有技术得益于深度学习的最新进展,并且通常将任务表述为有监督的学习问题,并从一组可能的选项中选择一个边界框。为了训练这些基于深网的方法,需要访问大规模数据集,但是,构建这样的数据集既耗时又昂贵。因此,我们使用假设检验作为将单词链接到检测到的图像概念的机制,开发了一种完全不受监督的文本基础化机制。我们在ReferIt Game数据集和Flickr30k数据上展示了我们的方法,分别比基准高出7.98%和6.96%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号