...
首页> 外文期刊>IBRO Reports >Towards personalized image captioning via multimodal memory networks
【24h】

Towards personalized image captioning via multimodal memory networks

机译:通过多模式存储网络实现个性化图像字幕

获取原文
           

摘要

Towards personalized image captioning via multimodal memory networks We address personalized image captioning, which generates a descriptive sentence for a user’s image, accounting for prior knowl- edge such as her active vocabulary or writing style in her previous documents. As applications of personalized image captioning, we solve two post automation tasks in social networks: hashtag pre- diction and post generation. The hashtag prediction predicts a list of hashtags for an image, while the post generation creates a nat- ural text consisting of normal words, emojis, and even hashtags. We propose a novel personalized captioning model named Con- text Sequence Memory Network (CSMN). Its unique updates over existing memory networks include: (i) exploiting memory as a repository for multiple types of context information, (ii) append- ing previously generated words into memory to capture long-term information, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context under- standing. For evaluation, we collect a new dataset InstaPIC-1.1M, comprising 1.1M Instagram posts from 6.3K users. We further use the benchmark YFCC100M dataset to validate the generality of our approach. With quantitative evaluation and user studies via Ama- zon Mechanical Turk, we show that the three novel features of the CSMN help enhance the performance of personalized image captioning over state-of-the-art captioning models.
机译:通过多模式存储网络实现个性化图像标题我们解决了个性化图像标题问题,该问题可为用户的图像生成描述性句子,并考虑其先前的知识,例如,她的有效词汇或先前文档中的写作风格。作为个性化图像字幕的应用程序,我们解决了社交网络中的两个自动化后任务:主题标签预测和后期生成。主题标签预测可预测图像的主题标签列表,而后期生成将创建由普通单词,表情符号甚至主题标签组成的自然文本。我们提出了一种新颖的个性化字幕模型,称为上下文序列存储网络(CSMN)。它在现有内存网络上的独特更新包括:(i)利用内存作为多种类型的上下文信息的存储库;(ii)将先前生成的单词附加到内存中以捕获长期信息;以及(iii)采用CNN内存结构共同代表附近的有序内存插槽,以更好地理解上下文。为了进行评估,我们收集了一个新的数据集InstaPIC-1.1M,其中包括来自6.3K用户的110万个Instagram帖子。我们进一步使用基准YFCC100M数据集来验证我们方法的通用性。通过Amazon Mechanical Turk进行的定量评估和用户研究,我们证明了CSMN的三个新颖功能有助于提高个性化图像字幕的性能,而不是最新的字幕模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号