【24h】

Identifying Visible Actions in Lifestyle Vlogs

机译:在Lifestyle VLogs中识别可见操作

获取原文

摘要

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video. We demonstrate that our multimodal algorithm outperforms algorithms based only on one modality at a time.
机译:我们考虑识别在线视频中可见的人类行为的任务。我们专注于广泛传播的生活方式VLOG,这包括人们在口头描述它们的同时执行行动的视频。我们的目标是识别在视觉上存在视频中提到的动词描述中提到的操作。我们构建一个具有可见动作的众包手动注释的数据集,并引入了一种多模式算法,它利用来自视觉和语言线索的信息来自动推断在视频中可见的动作。我们展示了我们的多模式算法仅在一次模型上基于一个模态的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号