End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space

机译：端到端图像标题利用多模式空间中的分布相似性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image description generation, or image captioning (IC), is the task of automatically generating a textual description for a given image. The generated text is expected to describe, generally in a single sentence, what is visually depicted in the image, for example the entities/objects present in the image, their attributes, the actions/activities performed, entity/object interactions (including quantification), the location/scene, etc. (e.g. "a man riding a bike on the street"). Significant progress has been made with end-to-end approaches to tackling this problem, where parallel image-description datasets such as Flickr30k (Young et al., 2014) and MSCOCO (Chen et al., 2015) are used to train a CNN-RNN based neural network IC system (Vinyals et al., 2017; Karpathy and Fei-Fei, 2015; Xu et al., 2015). Such systems have demonstrated impressive performance in the COCO captioning challenge according to automatic metrics, seemingly even surpassing human performance in many instances (e.g. CIDEr score > 1.0 vs. human's 0.85) (Chen et al., 2015). However, in reality, the performance of end-to-end systems is still far from satisfactory according to metrics based on human judgement. This task is thus currently far from being a solved problem.

机译：图像描述生成或图像字幕（IC）是自动生成给定图像的文本描述的任务。期望生成的文本通常在单句中描述，图像中的视觉上描绘的内容，例如图像中存在的实体/对象，它们的属性，执行的动作/活动，实体/对象交互（包括量化），位置/场景等（例如，“骑自行车在街上的男子”）。在解决这个问题的结束方法方面取得了重大进展，其中Plickr30k（Young等，2014）和Mscoco（Chen等，2015）等平行图像描述数据集用于培训CNN基于-RNN的神经网络IC系统（Vinyals等，2017; Karpataly和Fei-Fei，2015; Xu等人，2015）。根据自动指标，这些系统在Coco标题挑战中表现出令人印象深刻的性能，看似甚至超过了许多情况下的人类性能（例如，苹果酒得分> 1.0与人类的0.85）（Chen等，2015）。然而，实际上，根据人类判断的指标，终端到最终系统的性能仍然远非令人满意。因此，此任务目前远未成为一个解决问题。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xviii 386 p.|共3页
会议地点
作者
Pranava Madhyastha; Josiah Wang; Lucia Specia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 23:26:37

相似文献

外文文献
中文文献
专利

1. Image Captioning With End-to-End Attribute Detection and Subsequent Attributes Prediction [J] . Huang Yiqing, Chen Jiansheng, Ouyang Wanli, IEEE Transactions on Image Processing . 2020,第期

机译：具有端到端属性检测和后续属性预测的图像标题
2. Reference-based model using multimodal gated recurrent units for image captioning [J] . Tiago do Carmo Nogueira, Cassio Dener Noronha Vinhal, Gelson da Cruz Junior, Multimedia Tools and Applications . 2020,第41a42期

机译：基于参考的模型，使用多模式门控复发单元进行图像标题
3. Multimodal Transformer With Multi-View Visual Representation for Image Captioning [J] . Yu Jun, Li Jing, Yu Zhou, IEEE Transactions on Circuits and Systems for Video Technology . 2020,第12期

机译：多模式变压器，具有图像标题的多视觉视觉表示
4. End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space [C] . Pranava Madhyastha, Josiah Wang, Lucia Specia 1st EMNLP workshop blackboxNLP: analyzing and interpreting neural networks for NLP 2018 . 2018

机译：端到端图像字幕利用多模式空间中的分布相似性
5. A content-based similarity retrieval system for multimodal functional brain images [D] . Tungaraza, Rosalia F. L. 2009

机译：基于内容的多模式功能脑图像相似度检索系统
6. MR Imaging–based Multimodal Autoidentification of Perivascular Spaces (mMAPS): Automated Morphologic Segmentation of Enlarged Perivascular Spaces at Clinical Field Strength [O] . Erin L. Boespflug, Daniel L. Schwartz, David Lahna, -1

机译：基于MR成像的血管周围空间多模式自动识别（mMAPS）：在临床视野强度下扩大的血管周围空间的形态自动分割
7. End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space [O] . Pranava Swaroop Madhyastha, Josiah Wang, Lucia Specia 2018

机译：端到端图像标题利用多模式空间中的分布相似性

End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space

摘要

著录项

相似文献

相关主题

期刊订阅