首页> 外文会议>AAAI Conference on Artificial Intelligence >Using Co-Captured Face, Gaze and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment
【24h】

Using Co-Captured Face, Gaze and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment

机译:使用共同捕获的面部,凝视和口头反应对分析和语义对齐的不同情绪内容的图像

获取原文

摘要

Analyzing different modalities of expression can provide insights into the ways that humans interpret, label, and react to images. Such insights have the potential not only to advance our understanding of how humans coordinate these expressive modalities but also to enhance existing methodologies for common AI tasks such as image annotation and classification. We conducted an experiment that co-captured the facial expressions, eye movements, and spoken language data that observers produce while examining images of varying emotional content and responding to description-oriented vs. affect-oriented questions about those images. We analyzed the facial expressions produced by the observers in order to determine the connection between those expressions and an image's emotional content. We also explored the relationship between the valence of an image and the verbal responses to that image, and how that relationship relates to the nature of the prompt, using low-level lexical features and more complex affective features extracted from the observers' verbal responses. Finally, in order to integrate this multimodal data, we extended an existing bitext alignment framework to create meaningful pairings between narrated observations about images and the image regions indicated by eye movement data. The resulting annotations of image regions with words from observers' responses demonstrate the potential of bitext alignment for multimodal data integration and, from an application perspective, for annotation of open-domain images. In addition, we found that while responses to affect-oriented questions appear useful for image understanding, their holistic nature seems less helpful for image region annotation.
机译:分析不同的表达方式可以为人类解释,标签和对图像作出反应的方式提供见解。这种见解有可能不仅要推进人类如何协调这些表现力的方式,而且还可以增强现有的AI任务,例如图像注释和分类。我们进行了一个实验,该实验将观察者在检查各种情绪内容的图像的同时共同捕获了观察者产生的口语数据和口语数据,并响应有关这些图像的面向描述的与影响的对问题的问题。我们分析了观察者产生的面部表情,以确定这些表达方式与图像的情感内容之间的联系。我们还探讨了图像的价值与对该图像的口头反应之间的关系,以及如何使用低级词汇特征和从观察者口头反应中提取的更复杂的情感特征与提示的性质有关。最后,为了集成这种多模式数据,我们扩展了现有的BITEXT对准框架,以在关于图像和眼球移动数据所示的图像区域和图像区域之间产生有意义的配对。从观察者响应中的单词产生的图像区域的注释证明了对多模式数据集成的BITEXT对准的潜力,并且从应用程序角度来看,用于注释开放域图像。此外,我们发现,虽然对受影响的问题的反应看起来很有用的图像理解,但它们的整体性质对图像区域注释似乎不太有用。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号