首页> 外文期刊>Computer vision and image understanding >Discovering hierarchical object models from captioned images
【24h】

Discovering hierarchical object models from captioned images

机译:从字幕图像中发现分层对象模型

获取原文
获取原文并翻译 | 示例

摘要

We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.
机译:我们解决了自动学习图像中的视觉结构与其关联标题中的单词之间的重复关联的问题,从而产生了一组可用于后续图像注释的命名对象模型。在以前的工作中,我们使用语言来驱动局部特征的可感知分组,以捕获对象的小部分(补丁)的配置。但是,模型范围较差,导致检测(注释)过程中的对象定位较差,而零件检测较弱时模棱两可性很高。我们通过使用语言来驱动零件的可感知分组(每个零件在以前的框架中)成层次结构,从而提供更大的空间范围和灵活性,从而扩展并显着修订了我们之前的框架。生成的分层多部分模型保持比例,平移和旋转不变,但它们是更可靠的检测器,并提供更好的定位。此外,与学习对象模型的典型框架不同,我们的方法不需要要学习的对象周围的边界框,可以处理非常混乱的训练场景,并且在面对嘈杂的字幕时也很健壮,即图像中的对象可能不是在标题中命名的标题,在标题中命名的对象可能不会出现在图像中。与非分层技术相比,我们在注释中展示了更高的精度和召回率,还展示了检测到的对象的扩展空间覆盖范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号