【24h】

From Lifestyle Vlogs to Everyday Interactions

机译:从生活时尚博客到日常互动

获取原文

摘要

A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start with a large collection of interaction-rich video data and then annotate and analyze it. We use Internet Lifestyle Vlogs as the source of surprisingly large and diverse interaction data. We show that by collecting the data first, we are able to achieve greater scale and far greater diversity in terms of actions and actors. Additionally, our data exposes biases built into common explicitly gathered data. We make sense of our data by analyzing the central component of interaction - hands. We benchmark two tasks: identifying semantic object contact at the video level and non-semantic contact state at the frame level. We additionally demonstrate future prediction of hands.
机译:缺乏良好的培训数据是理解基本人类互动(例如起床或打开冰箱)的主要绊脚石。过去的大多数努力都是明确地收集这些数据的:从动作标签的清单开始,然后在搜索引擎中查询带有每个标签标记的视频。在这项工作中,我们进行反向操作并隐式搜索:我们从大量具有交互性的视频数据开始,然后进行注释和分析。我们使用Internet Lifestyle Vlog作为令人惊讶的庞大且多样化的交互数据的来源。我们表明,通过首先收集数据,我们可以在行动和参与者方面实现更大的规模和更大的多样性。此外,我们的数据还暴露了内置在常见的明确收集的数据中的偏差。我们通过分析交互的核心部分-手来理解数据。我们对两个任务进行基准测试:在视频级别识别语义对象接触,在帧级别识别非语义接触状态。我们还将演示手的未来预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号