首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Socializing the Videos: A Multimodal Approach for Social Relation Recognition
【24h】

Socializing the Videos: A Multimodal Approach for Social Relation Recognition

机译:社交中的社会关系识别的多语级方法

获取原文
获取原文并翻译 | 示例

摘要

As a crucial task for video analysis, social relation recognition for characters not only provides semantically rich description of video content but also supports intelligent applications, e.g., video retrieval and visual question answering. Unfortunately, due to the semantic gap between visual and semantic features, traditional solutions may fail to reveal the accurate relations among characters. At the same time, the development of social media platforms has now promoted the emergence of crowdsourced comments, which may enhance the recognition task with semantic and descriptive cues. To that end, in this article, we propose a novel multimodal-based solution to deal with the character relation recognition task. Specifically, we capture the target character pairs via a search module and then design a multistream architecture for jointly embedding the visual and textual information, in which feature fusion and attention mechanism are adapted for better integrating the multimodal inputs. Finally, supervised learning is applied to classify character relations. Experiments on real-world data sets validate that our solution outperforms several competitive baselines.
机译:作为视频分析的关键任务,社交关系识别对于角色不仅提供了对视频内容的语义上丰富的描述,而且还支持智能应用程序,例如视频检索和视觉问题应答。遗憾的是,由于视觉和语义特征之间的语义差距,传统的解决方案可能无法揭示字符之间的准确关系。与此同时,社交媒体平台的发展现已推动了众群评论的出现,这可能会通过语义和描述性提示增强识别任务。为此,在本文中,我们提出了一种新颖的基于多模式的解决方案来处理字符关系识别任务。具体地,我们通过搜索模块捕获目标字符对,然后设计用于共同嵌入视觉和文本信息的多阵线架构,其中特征融合和注意机制适用于更好地集成多模式输入。最后,监督学习应用于分类字符关系。实际数据集的实验验证了我们的解决方案优于几种竞争基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号