Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

机译：深度合成字幕：描述没有配对训练数据的新颖对象类别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired imagesentence datasets. Our method achieves this by leveraging large object recognition datasets and external text corpora and by transferring knowledge between semantically similar concepts. Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet. In contrast, our model can compose sentences that describe novel objects and their interactions with other objects. We demonstrate our model's ability to describe novel concepts by empirically evaluating its performance on MSCOCO and show qualitative results on ImageNet images of objects for which no paired image-sentence data exist. Further, we extend our approach to generate descriptions of objects in video clips. Our results show that DCC has distinct advantages over existing image and video captioning approaches for generating descriptions of new objects in context.

机译：尽管最近的深度神经网络模型在图像字幕任务上取得了可喜的成果，但它们很大程度上依赖于具有配对图像和句子字幕的语料库来描述上下文中的对象。在这项工作中，我们提出了深度合成字幕机（DCC），以解决生成在成对的图像句子数据集中不存在的新颖对象的描述的任务。我们的方法通过利用大型对象识别数据集和外部文本语料库以及在语义相似的概念之间传递知识来实现此目的。当前的深字幕模型只能描述成对的图像句子语料库中包含的对象，尽管它们已通过大型对象识别数据集（即ImageNet）进行了预训练。相反，我们的模型可以组成描述新颖对象及其与其他对象的相互作用的句子。我们通过经验评估模型在MSCOCO上的性能来证明模型描述新颖概念的能力，并在不存在成对图像句数据的对象的ImageNet图像上显示定性结果。此外，我们扩展了方法以生成视频剪辑中对象的描述。我们的结果表明，DCC与现有的图像和视频字幕方法相比，在上下文中生成新对象的描述方面具有明显的优势。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2016年|1-10|共10页
会议地点
作者
Lisa Anne Hendricks; Subhashini Venugopalan; Marcus Rohrbach; Raymond Mooney; Kate Saenko; Trevor Darrell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data models; Visualization; Predictive models; Feature extraction; Training; Object recognition; Context;

机译：数据模型可视化预测模型特征提取训练目标识别上下文;

相似文献

外文文献
中文文献
专利

1. Custom datasets from 0.-A brief discussion on how to acquire and prepare data for training deep learning neural networks. Whether for sorting, recognizing objects or targeting. [J] . Andr Costa Journal of Telecommunications System & Management . 2020,第4期

机译：自定义数据集免于0.关于如何获取和准备培训深度学习神经网络的数据的简要讨论。是否进行排序，识别对象或目标。
2. Deep learning for ultrasound image caption generation based on object detection [J] . Zeng Xianhua, Wen Li, Liu Banggui, Neurocomputing . 2020,第Juna7期

机译：基于对象检测的超声图像标题生成深度学习
3. A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain [J] . Neuron . 2012,第6期

机译：连续的语义空间描述了人类大脑中数千个对象和动作类别的表示
4. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data [C] . Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, IEEE Conference on Computer Vision and Pattern Recognition . 2016

机译：深度成分标题：描述没有配对训练数据的新型对象类别
5. Evaluation of Synthetic Training Data and Training-Data-Augmentation Techniques for Object Detection in Ground-Penetrating Radar Data using Deep-Learning Models [D] . Ruggiero, Jean. 2021

机译：使用深度学习模型评估用于地面穿透雷达数据的对象检测的综合训练数据和训练数据增强技术
6. A continuous semantic space describes the representation of thousands of object and action categories across the human brain [O] . Alexander G. Huth, Shinji Nishimoto, An T. Vu, -1

机译：连续语义空间描述了人类大脑上数千个对象和行动类别的表示
7. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data [O] . Hendricks, Lisa Anne, Venugopalan, Subhashini, Rohrbach, Marcus, 2016

机译：深刻的构图字幕：描述新的对象类别没有配对培训数据
8. Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator. [R] . Wilson, A., Raglin, A. 2017

机译：训练数据集组合对神经图像字幕生成器性能的影响。

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

摘要

著录项

相似文献

相关主题

期刊订阅