首页> 外文会议>European conference on computer vision >Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
【24h】

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

机译:扩展以自我为中心的视觉:EPIC-KITCHENS数据集

获取原文

摘要

First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict non-scripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55h of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens.
机译:第一人称视角越来越引起人们的兴趣,因为它为人们与物体的互动,他们的注意力甚至意图提供了独特的观点。但是,由于缺乏足够大的数据集,这一具有挑战性的领域的进展相对缓慢。在本文中,我们介绍了EPIC-KITCHENS,这是一个大型的以自我为中心的视频基准测试,由32位参与者在其本机厨房环境中录制。我们的视频描述了非脚本化的日常活动:我们只是要求每个参与者每次进入厨房时都开始记录。来自10个不同民族的参与者在4个城市(北美和欧洲)进行了录音,从而产生了多种多样的烹饪风格。我们的数据集包含55h的视频,其中包含1150万帧,我们对其进行了密集标记,以表示总共39.6K个动作段和454.3K个对象边界框。我们的注释是独特的,因为我们让参与者对自己的视频进行叙述(在录制后),从而反映出真实的意图,然后我们基于这些视频众包地基。我们描述了我们的目标,行动和预期挑战,并通过两个测试区域(可见和不可见的厨房)评估了几个基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号