首页> 外文会议>Iranian Conference on Signal Processing and Intelligent Systems >Zero-Shot Learning on Human-Object Interaction Recognition in Video
【24h】

Zero-Shot Learning on Human-Object Interaction Recognition in Video

机译:视频中人与物体交互识别的零射学习

获取原文

摘要

Recognition of human activities is an essential field in computer vision. Much of human activities consist of humanobject interaction (HOI). A lot of successful works has done on HOI recognition and achieved acceptable results, but they are fully supervised and need to training labeled data for all HOIs. The space of possible human-object interactions is huge, and listing and providing training data for all categories is costly and impractical. We tackle this problem by proposing an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique. Our method recognizes a verb and an object from video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows the identification of a new combination of verb an object as a new HOI class that not seen by the recognizer model. We introduce a neural network architecture that can understand video data. The proposed model learns verbs and objects from available training data at the training phase, and at test time can detect the pairs of verb and object in a video, and so identify the HOI class. We evaluated our model by recently introduced charades dataset which has lots of HOI categories in videos. We show that our model can detect unseen HOI classes in addition to the acceptable recognition of seen types. And so more significant number categories are identifiable than the number of training classes.
机译:识别人类活动是计算机视觉的重要领域。人类的许多活动都由人与物体的交互(HOI)组成。在HOI识别方面已经完成了许多成功的工作,并取得了可接受的结果,但是它们受到了充分的监督,需要对所有HOI进行标记数据的训练。可能的人与对象交互的空间很大,列出和提供所有类别的训练数据既昂贵又不切实际。我们通过提出一种通过零镜头学习技术来缩放视频数据中人对对象交互识别的方法来解决此问题。我们的方法从视频中识别动词和宾语,并创建HOI类。动词和宾语而不是HOI的识别允许将对象动词的新组合识别为识别器模型未看到的新HOI类。我们介绍了一种可以理解视频数据的神经网络架构。所提出的模型在训练阶段从可用的训练数据中学习动词和宾语,并且在测试时可以检测视频中的动词和宾语对,从而识别HOI类。我们通过最近引入的charades数据集评估了我们的模型,该数据集在视频中具有很多HOI类别。我们表明,除了可接受的识别可见类型之外,我们的模型还可以检测到看不见的HOI类。因此,与培训课程的数量相比,可以识别出更多重要的数字类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号