Towards personalized image captioning via multimodal memory networks

Byeongchang Kim; Cesc Chunseong Park; Gunhee Kim

首页> 外文期刊>IBRO Reports >Towards personalized image captioning via multimodal memory networks

【24h】

Towards personalized image captioning via multimodal memory networks

机译：通过多模式存储网络实现个性化图像字幕

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Towards personalized image captioning via multimodal memory networks We address personalized image captioning, which generates a descriptive sentence for a user’s image, accounting for prior knowl- edge such as her active vocabulary or writing style in her previous documents. As applications of personalized image captioning, we solve two post automation tasks in social networks: hashtag pre- diction and post generation. The hashtag prediction predicts a list of hashtags for an image, while the post generation creates a nat- ural text consisting of normal words, emojis, and even hashtags. We propose a novel personalized captioning model named Con- text Sequence Memory Network (CSMN). Its unique updates over existing memory networks include: (i) exploiting memory as a repository for multiple types of context information, (ii) append- ing previously generated words into memory to capture long-term information, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context under- standing. For evaluation, we collect a new dataset InstaPIC-1.1M, comprising 1.1M Instagram posts from 6.3K users. We further use the benchmark YFCC100M dataset to validate the generality of our approach. With quantitative evaluation and user studies via Ama- zon Mechanical Turk, we show that the three novel features of the CSMN help enhance the performance of personalized image captioning over state-of-the-art captioning models.

机译：通过多模式存储网络实现个性化图像标题我们解决了个性化图像标题问题，该问题可为用户的图像生成描述性句子，并考虑其先前的知识，例如，她的有效词汇或先前文档中的写作风格。作为个性化图像字幕的应用程序，我们解决了社交网络中的两个自动化后任务：主题标签预测和后期生成。主题标签预测可预测图像的主题标签列表，而后期生成将创建由普通单词，表情符号甚至主题标签组成的自然文本。我们提出了一种新颖的个性化字幕模型，称为上下文序列存储网络（CSMN）。它在现有内存网络上的独特更新包括：（i）利用内存作为多种类型的上下文信息的存储库;（ii）将先前生成的单词附加到内存中以捕获长期信息;以及（iii）采用CNN内存结构共同代表附近的有序内存插槽，以更好地理解上下文。为了进行评估，我们收集了一个新的数据集InstaPIC-1.1M，其中包括来自6.3K用户的110万个Instagram帖子。我们进一步使用基准YFCC100M数据集来验证我们方法的通用性。通过Amazon Mechanical Turk进行的定量评估和用户研究，我们证明了CSMN的三个新颖功能有助于提高个性化图像字幕的性能，而不是最新的字幕模型。

著录项

来源
《IBRO Reports》 |2019年第3期|共1页
作者
Byeongchang Kim; Cesc Chunseong Park; Gunhee Kim;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类神经病学与精神病学;
关键词

相似文献

外文文献
中文文献
专利

1. Towards personalized image captioning via multimodal memory networks [J] . Byeongchang Kim, Cesc Chunseong Park, Gunhee Kim IBRO Reports . 2019,第1期

机译：通过多模式存储网络实现个性化图像字幕
2. Towards Personalized Image Captioning via Multimodal Memory Networks [J] . Park Cesc Chunseong, Kim Byeongchang, Kim Gunhee IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第4期

机译：通过多模式内存网络实现个性化图像字幕
3. Multimodal architecture for video captioning with memory networks and an attention mechanism [J] . Li Wei, Guo Dashan, Fang Xiangzhong Pattern recognition letters . 2018,第APRa1期

机译：具有存储网络的视频字幕多模式体系结构和一种注意机制
4. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks [C] . Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：出席您：使用上下文序列存储网络的个性化图像字幕
5. Ensemble Learning on Deep Neural Networks for Image Caption Generation [D] . Katpally, Harshitha 2019

机译：在深度神经网络上进行集成学习以生成图像字幕
6. Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s Disease using structural MR and FDG-PET images [O] . Donghuan Lu, Karteek Popuri, Gavin Weiguang Ding, -1

机译：利用结构性MR和FDG-PET图像对阿尔茨海默氏病进行早期诊断的多模式多尺度深度神经网络
7. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks [O] . Park, Cesc Chunseong, Kim, Byeongchang, Kim, Gunhee 2017

机译：参加你：使用上下文序列的个性化图像字幕内存网络

Towards personalized image captioning via multimodal memory networks

摘要

著录项

相似文献

相关主题

期刊订阅