Object Counts! Bringing Explicit Detections Back into Image Captioning

机译：对象计数！将显式检测重新带到图像字幕中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding. We argue that explicit detections provide rich semantic information, and can thus be used as an interpretable representation to better understand why end-to-end image captioning systems work well. We provide an in-depth analysis of end-to-end image captioning by exploring a variety of cues that can be derived from such object detections. Our study reveals that end-to-end image captioning systems rely on matching image representations to generate captions, and that encoding the frequency, size and position of objects are complementary and all play a role in forming a good image representation. It also reveals that different object categories contribute in different ways towards image captioning.

机译：在当前占主导地位的端到端方法中，通常将语言对象直接作为中间语言的条件，而绕过了将显式对象检测器用作图像字幕的中间步骤的过程，该步骤曾经构成早期工作的必不可少的步骤。级图像嵌入。我们认为显式检测提供了丰富的语义信息，因此可以用作可解释的表示形式，以更好地理解为什么端到端图像字幕系统可以很好地工作。通过探索可从此类物体检测中得出的各种线索，我们提供了对端到端图像字幕的深入分析。我们的研究表明，端到端图像字幕系统依靠匹配的图像表示来生成字幕，并且编码对象的频率，大小和位置是互补的，并且都在形成良好的图像表示中发挥作用。它还揭示了不同的对象类别以不同的方式对图像字幕做出了贡献。

著录项

来源
《Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies》|2018年|2180-2193|共14页
会议地点
作者
Josiah Wang; Pranava Madhyastha; Lucia Specia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:51:15

相似文献

外文文献
中文文献
专利

1. Deep learning for ultrasound image caption generation based on object detection [J] . Zeng Xianhua, Wen Li, Liu Banggui, Neurocomputing . 2020,第Juna7期

机译：基于对象检测的超声图像标题生成深度学习
2. An image caption method based on object detection [J] . Cao Danyang, Zhu Menggui, Gao Lei Multimedia Tools and Applications . 2019,第24期

机译：基于目标检测的图像字幕方法
3. MOVING OBJECTS DETECTION IN IMAGE SEQUENCE WITHOUT EXPLICIT TRACKING [J] . Takayuki Kojima, Akihiro Minagawa, Norio Tagawa 電子情報通信学会技術研究報告. 画像工学. Image Engineering . 2005,第501期

机译：无需明确跟踪即可按图像顺序检测运动物体
4. Object Counts! Bringing Explicit Detections Back into Image Captioning [C] . Josiah Wang, Pranava Madhyastha, Lucia Specia Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies . 2018

机译：对象数目！将显式检测恢复到图像标题中
5. Use of Remote Imagery and Object-based Image Methods to Count Plants in an Open-field Container Nursery. [D] . Leiva Lopez, Josue Nahun. 2014

机译：使用远程图像和基于对象的图像方法对露天容器育苗场中的植物进行计数。
6. Image Captioning Using Motion-CNN with Object Detection [O] . Kiyohiko Iwamura, Jun Younes Louhi Kasahara, Alessandro Moro, 2021

机译：使用具有对象检测的Motion-CNN的图像标题
7. Object Counts! Bringing Explicit Detections Back into Image Captioning [O] . Josiah Wang, Pranava Swaroop Madhyastha, Lucia Specia 2018

机译：对象数目！将显式检测恢复到图像标题中
8. Quantum-Efficient Systematics-Free Photon-Counting Optical Imaging System forLong Baseline Interferometric Imaging of Faint Deep Space Objects [R] . Hege, E. K. 1990

机译：量子效率无系数的光子计数光学成像系统，用于微弱深空物体的长基线干涉成像

Object Counts! Bringing Explicit Detections Back into Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅