首页> 外文会议>Machine intelligence and bio-inspired computation: theory and applications VII >Fusing video and text data by integrating appearance and behavior similarity
【24h】

Fusing video and text data by integrating appearance and behavior similarity

机译:通过整合外观和行为相似性来融合视频和文本数据

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we describe an algorithm for multi-modal entity co-reference resolution and present experimental results using text and motion imagery data sources. Our model generates probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. Appearance similarity is calculated as a match between proposition-derived entity attributes mentioned in text, and the object appearance classification from video sources. The behavior similarity is calculated based on the semantic information about entity movements, actions, and interactions with other entities mentioned in text and detected in video sources. Our model achieved 79% F-score for text-to-video entity co-reference resolution; we show that entity interactions present unique features for resolving variability present in text data and ambiguity of visual appearance of entities.
机译:在本文中,我们描述了一种用于多模式实体共参考分辨率的算法,并使用文本和运动图像数据源显示了实验结果。我们的模型通过共同优化外观和行为相似性的度量,在文本中提到的实体和视频数据中检测到的实体之间产生概率关联。外观相似度计算为文本中提到的命题派生实体属性与视频源中的对象外观分类之间的匹配。行为相似性是基于有关实体运动,动作以及与文本中提及并在视频源中检测到的其他实体的交互的语义信息来计算的。我们的模型获得了文本到视频实体共同引用解析的79%F分数;我们表明,实体交互提供了独特的功能来解决文本数据中存在的变异性以及实体的视觉模糊性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号