首页> 外文期刊>ACM transactions on intelligent systems >CapVis: Toward Better Understanding of Visual-Verbal Saliency Consistency
【24h】

CapVis: Toward Better Understanding of Visual-Verbal Saliency Consistency

机译:CapVis:更好地理解视觉语言显着性一致性

获取原文
获取原文并翻译 | 示例
       

摘要

When looking at an image, humans shift their attention toward interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem intuitively, we develop a visual analytics system, CapVis, to look into visual attention and image captioning, two types of subjective annotations that are relatively task-free and natural. Using these annotations, we propose a word-weighting scheme to extract visual and verbal saliency ranks to compare against each other. In our approach, a number of low-level and semantic-level features relevant to visual-verbal saliency consistency are proposed and visualized for a better understanding of image content. Our method also shows the different ways that a human and a computational model look at and describe images, which provides reliable information for a captioning model. Experiment also shows that the visualized feature can be integrated into a computational model to effectively predict the consistency between the two modalities on an image dataset with both types of annotations.
机译:当观看图像时,人类将注意力转移到有趣的区域,从而进行眼睛注视。在描述图像时,他们还会想出简单的句子来突出显示场景中的关键元素。人们看起来和他们在图像中所描绘的东西之间有什么关联?为了直观地调查此问题,我们开发了视觉分析系统CapVis,以研究视觉注意力和图像字幕,这两种类型的主观注释相对来说无需任务,而且很自然。使用这些注释,我们提出了一种词加权方案,以提取视觉和言语显着性等级以相互比较。在我们的方法中,提出了许多与视觉语言显着一致性相关的低层和语义层特征,并对其进行了可视化,以更好地理解图像内容。我们的方法还显示了人类和计算模型查看和描述图像的不同方式,这为字幕模型提供了可靠的信息。实验还表明,可视化特征可以集成到计算模型中,以有效预测具有两种类型注释的图像数据集上两种模态之间的一致性。

著录项

  • 来源
    《ACM transactions on intelligent systems》 |2019年第1期|10.1-10.23|共23页
  • 作者单位

    Zhejiang Univ Technol, Dept Informat Engn, 288 Liuhe Rd, Hangzhou 310013, Zhejiang, Peoples R China;

    Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA;

    Zhejiang Univ Technol, Dept Informat Engn, 288 Liuhe Rd, Hangzhou 310013, Zhejiang, Peoples R China;

    Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image captioning; visual saliency; visual analytics;

    机译:图像字幕;视觉显着性;视觉分析;
  • 入库时间 2022-08-18 04:16:06

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号