首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Binge Watching: Scaling Affordance Learning from Sitcoms
【24h】

Binge Watching: Scaling Affordance Learning from Sitcoms

机译:观看狂欢:从Sitcoms扩展负担能力学习

获取原文

摘要

In recent years, there has been a renewed interest in jointly modeling perception and action. At the core of this investigation is the idea of modeling affordances. However, when it comes to predicting affordances, even the state of the art approaches still do not use any ConvNets. Why is that? Unlike semantic or 3D tasks, there still does not exist any large-scale dataset for affordances. In this paper, we tackle the challenge of creating one of the biggest dataset for learning affordances. We use seven sitcoms to extract a diverse set of scenes and how actors interact with different objects in the scenes. Our dataset consists of more than 10K scenes and 28K ways humans can interact with these 10K images. We also propose a two-step approach to predict affordances in a new scene. In the first step, given a location in the scene we classify which of the 30 pose classes is the likely affordance pose. Given the pose class and the scene, we then use a Variational Autoencoder (VAE) [23] to extract the scale and deformation of the pose. The VAE allows us to sample the distribution of possible poses at test time. Finally, we show the importance of large-scale data in learning a generalizable and robust model of affordances.
机译:近年来,人们对重新组合感知和行动建模有了新的兴趣。此调查的核心是对能力进行建模的想法。但是,在预测可负担性时,即使是最先进的方法,仍然不会使用任何ConvNet。这是为什么?与语义或3D任务不同,仍然没有任何大型的能力数据集。在本文中,我们解决了创建最大的学习能力数据集之一的挑战。我们使用七个情景喜剧来提取各种场景,以及演员如何与场景中的不同对象进行交互。我们的数据集包含1万多个场景和28K种人类可以与这些10K图像进行交互的方式。我们还提出了一种两步法来预测新场景中的可负担能力。第一步,给定场景中的位置,我们将30个姿势类别中的哪个分类为可能的负担姿势。给定姿势类和场景,然后我们使用变分自动编码器(VAE)[23]提取姿势的比例和变形。 VAE允许我们在测试时采样可能的姿势分布。最后,我们展示了大规模数据在学习可概括性和健壮的支付能力模型中的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号