首页> 外文会议>IEEE International Conference on Mechatronics and Automation >Improving Caption Consistency to Image with Semantic Filter by Adversarial Training
【24h】

Improving Caption Consistency to Image with Semantic Filter by Adversarial Training

机译:通过对抗性培训改进与语义过滤器的图像的标题一致性

获取原文

摘要

Benefiting from the larger-scale dataset, image captioning has achieved remarkable success to generate more humanlike captions. However, for the specific tasks (e.g., stylized image captioning) trained with the small-scale dataset, the visual objects and semantic diversity are generally insufficient. Although the generated captions are suitable, it still lacks in depicting the image with comprehensive visual objects, which leads to a reduction in the fluency and accuracy expressions. To address this issue, we proposed an image captioning system based on an adversarial training strategy. To improve the accuracy, a semantic filter module is implemented to obtain the informative context from the semantic vectors. With a two-separated LSTM architecture, our model learns the image features and semantic vectors at the global and local levels. Through adversarial training, the generated caption can be integrated with accurate information and expressed in a fluent style. Experiment results show the outstanding performance of our approach to capture semantic knowledge on the FlickrStyle10K dataset. The linguistic analysis demonstrates our model succeeds in improving the accuracy and fluency of generated captions.
机译:从较大级别的数据集中受益,图像标题已经取得了显着的成功,以产生更多人类的字幕。然而,对于使用小规模数据集接受训练的特定任务(例如,风格化图像标题),视觉对象和语义多样性通常不足。虽然所生成的标题是合适的,但仍然缺乏描绘具有综合视觉物体的图像,这导致流畅性和准确性表达的降低。为了解决这个问题,我们提出了一种基于对抗培训策略的图像标题系统。为了提高准确性,实现了语义滤波器模块以从语义向量中获取信息性的上下文。通过双分隔的LSTM架构,我们的模型在全球和本地层面学习图像特征和语义向量。通过对抗性培训,所生成的标题可以与准确的信息集成,并以流畅的风格表示。实验结果表明我们在Flickrstyle10k数据集中捕获语义知识的方法的出色表现。语言分析表明我们的模型成功地提高了生成标题的准确性和流畅性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号