Discovering hierarchical object models from captioned images

Michael Jamieson; Yulia Eskin; Afsaneh Fazly; Suzanne Stevenson; Sven J. Dickinson

首页> 外文期刊>Computer vision and image understanding >Discovering hierarchical object models from captioned images

【24h】

Discovering hierarchical object models from captioned images

机译：从字幕图像中发现分层对象模型

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.

机译：我们解决了自动学习图像中的视觉结构与其关联标题中的单词之间的重复关联的问题，从而产生了一组可用于后续图像注释的命名对象模型。在以前的工作中，我们使用语言来驱动局部特征的可感知分组，以捕获对象的小部分（补丁）的配置。但是，模型范围较差，导致检测（注释）过程中的对象定位较差，而零件检测较弱时模棱两可性很高。我们通过使用语言来驱动零件的可感知分组（每个零件在以前的框架中）成层次结构，从而提供更大的空间范围和灵活性，从而扩展并显着修订了我们之前的框架。生成的分层多部分模型保持比例，平移和旋转不变，但它们是更可靠的检测器，并提供更好的定位。此外，与学习对象模型的典型框架不同，我们的方法不需要要学习的对象周围的边界框，可以处理非常混乱的训练场景，并且在面对嘈杂的字幕时也很健壮，即图像中的对象可能不是在标题中命名的标题，在标题中命名的对象可能不会出现在图像中。与非分层技术相比，我们在注释中展示了更高的精度和召回率，还展示了检测到的对象的扩展空间覆盖范围。

著录项

来源
《Computer vision and image understanding》 |2012年第7期|p.842-853|共12页
作者
Michael Jamieson; Yulia Eskin; Afsaneh Fazly; Suzanne Stevenson; Sven J. Dickinson;
展开▼
作者单位

Department of Computer Science, University of Toronto, 6 King's College Rd., Toronto, Ontario, Canada M5S 3C4;

Department of Computer Science, University of Toronto, 6 King's College Rd., Toronto, Ontario, Canada M5S 3C4;

Department of Computer Science, University of Toronto, 6 King's College Rd., Toronto, Ontario, Canada M5S 3C4;

Department of Computer Science, University of Toronto, 6 King's College Rd., Toronto, Ontario, Canada M5S 3C4;

Department of Computer Science, University of Toronto, 6 King's College Rd., Toronto, Ontario, Canada M5S 3C4;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
language-vision integration; object recognition; automatic image annotation; learning hierarchical models;

机译：语言视觉整合;目标识别自动图像注释;学习层次模型;

相似文献

外文文献
中文文献
专利

1. A neural image captioning model with caption-to-images semantic constructor [J] . Su Jinsong, Tang Jialong, Lu Ziyao, Neurocomputing . 2019,第Nova20期

机译：具有字幕到图像语义构造函数的神经图像字幕模型
2. Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language [J] . An-An Liu, Ning Xu, Yongkang Wong, Computer vision and image understanding . 2017,第octa期

机译：分层和多模式视频字幕：发现视觉的多模式知识并将其转移到语言
3. Hierarchical image modeling for object-based media retrieval [J] . Wen--Syan Li, K. Selcuk Candan, Kyoji Hirata Data & Knowledge Engineering . 1998,第2期

机译：基于对象的媒体检索的分层图像建模
4. Discovering Multipart Appearance Models from Captioned Images [C] . Michael Jamieson, Yulia Eskin, Afsaneh Fazly, ECCV 2010;European conference on computer vision . 2010

机译：从字幕图像中发现多部分外观模型
5. Discovering objects in images and videos. [D] . Liu, David. 2008

机译：发现图像和视频中的对象。
6. Image Captioning Using Motion-CNN with Object Detection [O] . Kiyohiko Iwamura, Jun Younes Louhi Kasahara, Alessandro Moro, 2021

机译：使用具有对象检测的Motion-CNN的图像标题
7. Discovering Multipart Appearance Models from Captioned Images [O] . Michael Jamieson, Yulia Eskin, Afsaneh Fazly, 2011

机译：从字幕图像中发现多部分外观模型

Discovering hierarchical object models from captioned images

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅