Fusing video and text data by integrating appearance and behavior similarity

机译：通过整合外观和行为相似性来融合视频和文本数据

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we describe an algorithm for multi-modal entity co-reference resolution and present experimental results using text and motion imagery data sources. Our model generates probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. Appearance similarity is calculated as a match between proposition-derived entity attributes mentioned in text, and the object appearance classification from video sources. The behavior similarity is calculated based on the semantic information about entity movements, actions, and interactions with other entities mentioned in text and detected in video sources. Our model achieved 79% F-score for text-to-video entity co-reference resolution; we show that entity interactions present unique features for resolving variability present in text data and ambiguity of visual appearance of entities.

机译：在本文中，我们描述了一种用于多模式实体共参考分辨率的算法，并使用文本和运动图像数据源显示了实验结果。我们的模型通过共同优化外观和行为相似性的度量，在文本中提到的实体和视频数据中检测到的实体之间产生概率关联。外观相似度计算为文本中提到的命题派生实体属性与视频源中的对象外观分类之间的匹配。行为相似性是基于有关实体运动，动作以及与文本中提及并在视频源中检测到的其他实体的交互的语义信息来计算的。我们的模型获得了文本到视频实体共同引用解析的79％F分数；我们表明，实体交互提供了独特的功能来解决文本数据中存在的变异性以及实体的视觉模糊性。

著录项

来源
《Machine intelligence and bio-inspired computation: theory and applications VII》|2013年|875107.1-875107.10|共10页
会议地点 Baltimore MD(US)
作者
Georgiy Levchuk; Charlotte Shabarekh;
展开▼
作者单位

Aptima Inc., 12 Gill Street, Suite 1400, Woburn, MA, USA 01801;

Aptima Inc., 12 Gill Street, Suite 1400, Woburn, MA, USA 01801;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Co-reference resolution; multi-modal fusion; computer vision; activity recognition; behavior signatures; video-to-text; adversarial reasoning; scene understanding;

机译：共参考解析；多峰融合计算机视觉;活动识别；行为签名；视频到文本；对抗性推理场景理解;

相似文献

外文文献
中文文献
专利

1. Human fall detection in videos via boosting and fusing statistical features of appearance, shape and motion dynamics on Riemannian manifolds with applications to assisted living [J] . Yixiao Yun, Irene Yu-Hua Gu Computer vision and image understanding . 2016,第Jula期

机译：通过增强和融合黎曼流形上的外观，形状和运动动力学的统计特征并将其融合到辅助生活中来检测视频中的人跌倒
2. Integrating and using large databases of text, images, video, and audio [J] . Hauptmann A.G. IEEE intelligent systems & their applications . 1999,第5期

机译：集成和使用大型文本，图像，视频和音频数据库
3. Fusing cluster-centric feature similarities for face recognition in video sequences [J] . John See, Mohammad Faizal Ahmad Fauzi, C. Eswaran Pattern recognition letters . 2013,第16期

机译：融合以聚类为中心的特征相似性以进行视频序列中的人脸识别
4. Fusing video and text data by integrating appearance and behavior similarity [C] . Georgiy Levchuk, Charlotte Shabarekh Conference on machine intelligence and bio-inspired computation: theory and applications VII . 2013

机译：通过集成外观和行为相似性来解决视频和文本数据
5. Building a map for robot path planning by fusing video images and laser rangefinder data [D] . Reynolds, Steven Lamar 1993

机译：通过融合视频图像和激光测距仪数据来构建用于机器人路径规划的地图
6. Fusing literature and full network data improves disease similarity computation [O] . Ping Li, Yaling Nie, Jingkai Yu 2016

机译：融合文献和完整的网络数据可改善疾病相似度计算
7. FUSE (Fuzzy Similarity Measure) - A measure for determining fuzzy short text similarity using Interval Type-2 fuzzy sets [O] . Naeemeh Adel, Keeley Crockett, Alan Crispin, 2018

机译：熔断器（模糊相似度测量） - 使用间隔类型-2模糊集确定模糊短文本相似度的度量

Fusing video and text data by integrating appearance and behavior similarity

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅