Weakly-supervised image captioning based on rich contextual information

Zheng Hai-Tao; Wang Zhe; Ma Ningning; Chen Jinyuan; Xiao Xi; Sangaiah Arun Kumar

首页> 外文期刊>Multimedia Tools and Applications >Weakly-supervised image captioning based on rich contextual information

【24h】

Weakly-supervised image captioning based on rich contextual information

机译：基于丰富上下文信息的弱监督图像字幕

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatically generation of an image description is a challenging task which attracts broad attention in artificial intelligence. Inspired by methods of computer vision and natural language processing, different approaches have been proposed to solve the problem. However, captions generated by the existing approaches have been lack of enough contextual information to describe the corresponding images completely. The labeled captions in the training set only basically describe images and lack of enough contextual annotations. In this paper, we propose a Weakly-supervised Image Captioning Approach (WICA) to generate captions containing rich contextual information, without complete annotations for the contextual information in datasets. We utilize encoder-decoder neural networks to extract basic captioning features and leverage object detection networks to identify contextual features. Then, we encode the two levels of features by a phrase-based language model in order to generate captions with rich contextual information. The comprehensive experimental results reveal that proposed model outperforms the existing baselines in terms of on the richness and reasonability of contextual information for image captioning.

机译：自动生成图像描述是一项具有挑战性的任务，在人工智能领域引起了广泛关注。受计算机视觉和自然语言处理方法的启发，已提出了不同的方法来解决该问题。但是，现有方法生成的字幕缺少足够的上下文信息来完整描述相应的图像。训练集中带有标签的标题基本上仅描述图像，并且缺少足够的上下文注释。在本文中，我们提出了一种弱监督图像字幕方法（WICA），以生成包含丰富上下文信息的字幕，而无需为数据集中的上下文信息提供完整的注释。我们利用编码器-解码器神经网络来提取基本字幕功能，并利用对象检测网络来识别上下文特征。然后，我们通过基于短语的语言模型对功能的两个级别进行编码，以生成具有丰富上下文信息的字幕。全面的实验结果表明，在用于图像字幕的上下文信息的丰富性和合理性方面，所提出的模型优于现有基准。

著录项

来源
《Multimedia Tools and Applications》 |2018年第14期|18583-18599|共17页
作者
Zheng Hai-Tao; Wang Zhe; Ma Ningning; Chen Jinyuan; Xiao Xi; Sangaiah Arun Kumar;
展开▼
作者单位

Tsinghua Univ, Grad Sch Shenzhen, Tsinghua Southampton Web Sci Lab, Shenzhen, Guangdong, Peoples R China;

Tsinghua Univ, Grad Sch Shenzhen, Tsinghua Southampton Web Sci Lab, Shenzhen, Guangdong, Peoples R China;

Tsinghua Univ, Grad Sch Shenzhen, Tsinghua Southampton Web Sci Lab, Shenzhen, Guangdong, Peoples R China;

Tsinghua Univ, Grad Sch Shenzhen, Tsinghua Southampton Web Sci Lab, Shenzhen, Guangdong, Peoples R China;

Tsinghua Univ, Grad Sch Shenzhen, Shenzhen, Guangdong, Peoples R China;

VIT Univ, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Weakly-supervised learning; Rich contextual information; Encoder-decoder neural networks; Object detection; Phrase-based language model;

机译：图像字幕;弱监督学习;丰富的上下文信息;编解码神经网络;目标检测;基于短语的语言模型;

相似文献

外文文献
中文文献
专利

1. Scene graph captioner: Image captioning based on structural visual representation [J] . Xu Ning, Liu An-An, Liu Jing, Journal of visual communication & image representation . 2019,第JANa期

机译：场景图字幕：基于结构视觉表示的图像字幕
2. Contextual Region of Interest Based Medical Image Compression using Contextual Listless SPIHT Algorithm for Brain Images [J] . Mrs. S.Sridevi, Dr.V.R.Vijayakumar International Journal of Engineering and Technology . 2013,第5期

机译：使用上下文无关系的SPIHT算法对脑图像进行基于上下文感兴趣区域的医学图像压缩
3. Contextual Region of Interest Based Medical Image Compression using Contextual Listless SPIHT Algorithm for Brain Images [J] . Mrs. S.Sridevi, Dr.V.R.Vijayakumar International Journal of Engineering and Technology . 2013,第5期

机译：使用上下文无关系的SPIHT算法对脑图像进行基于上下文感兴趣区域的医学图像压缩
4. Enriching Video Captions With Contextual Text [C] . Philipp Rimle, Pelin Dogan-Schönberger, Markus Gross International Conference on Pattern Recognition . 2021

机译：使用上下文文本丰富视频字幕
5. Visual attention patterns for contextually rich images: Neurotypical adults in two age groups and adults with aphasia. [D] . Thiessen, Amber. 2013

机译：内容丰富的图像的视觉注意力模式：两个年龄段的神经型成年人和失语症成年人。
6. Caption-based topical descriptors for microscopic images of breast neoplasms as published in academic papers [O] . Sujin Kim, Shannon Lamkin, Pam Duncan -1

机译：中发表的学术论文对乳腺肿瘤的显微图像基于带字幕的局部描述符
7. Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style [O] . Hongwei Ge, Zehang Yan, Kai Zhang, 2019

机译：在人类认知风格中探索图像标题的整体上下文信息

Weakly-supervised image captioning based on rich contextual information

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅