Zero-Shot Learning on Human-Object Interaction Recognition in Video

机译：视频中人与物体交互识别的零射学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recognition of human activities is an essential field in computer vision. Much of human activities consist of humanobject interaction (HOI). A lot of successful works has done on HOI recognition and achieved acceptable results, but they are fully supervised and need to training labeled data for all HOIs. The space of possible human-object interactions is huge, and listing and providing training data for all categories is costly and impractical. We tackle this problem by proposing an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique. Our method recognizes a verb and an object from video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows the identification of a new combination of verb an object as a new HOI class that not seen by the recognizer model. We introduce a neural network architecture that can understand video data. The proposed model learns verbs and objects from available training data at the training phase, and at test time can detect the pairs of verb and object in a video, and so identify the HOI class. We evaluated our model by recently introduced charades dataset which has lots of HOI categories in videos. We show that our model can detect unseen HOI classes in addition to the acceptable recognition of seen types. And so more significant number categories are identifiable than the number of training classes.

机译：识别人类活动是计算机视觉的重要领域。人类的许多活动都由人与物体的交互（HOI）组成。在HOI识别方面已经完成了许多成功的工作，并取得了可接受的结果，但是它们受到了充分的监督，需要对所有HOI进行标记数据的训练。可能的人与对象交互的空间很大，列出和提供所有类别的训练数据既昂贵又不切实际。我们通过提出一种通过零镜头学习技术来缩放视频数据中人对对象交互识别的方法来解决此问题。我们的方法从视频中识别动词和宾语，并创建HOI类。动词和宾语而不是HOI的识别允许将对象动词的新组合识别为识别器模型未看到的新HOI类。我们介绍了一种可以理解视频数据的神经网络架构。所提出的模型在训练阶段从可用的训练数据中学习动词和宾语，并且在测试时可以检测视频中的动词和宾语对，从而识别HOI类。我们通过最近引入的charades数据集评估了我们的模型，该数据集在视频中具有很多HOI类别。我们表明，除了可接受的识别可见类型之外，我们的模型还可以检测到看不见的HOI类。因此，与培训课程的数量相比，可以识别出更多重要的数字类别。

著录项

来源
《Iranian Conference on Signal Processing and Intelligent Systems》|2019年|1-7|共7页
会议地点
作者
Vali Ollah Maraghi; Karim Faez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Training; Task analysis; Training data; Object recognition; Data models; Object detection;

机译：特征提取;训练;任务分析;训练数据;目标识别;数据模型;目标检测;

相似文献

外文文献
中文文献
专利

1. Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning [J] . Vali Ollah Maraghi, Karim Faez Computational intelligence and neuroscience . 2021,第a期

机译：通过零射击学习将人类对象交互识别缩放
2. Generalized zero-shot learning for action recognition with web-scale video data [J] . Liu Kun, Liu Wu, Ma Huadong, World Wide Web . 2019,第2期

机译：全面的零镜头学习，可用于网络级视频数据的动作识别
3. Explicit Modeling of Human-Object Interactions in Realistic Videos [J] . Prest Alessandro, Ferrari Vittorio, Schmid Cordelia Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2013,第4期

机译：真实视频中人与对象交互的显式建模
4. Zero-Shot Learning on Human-Object Interaction Recognition in Video [C] . Vali Ollah Maraghi, Karim Faez Iranian Conference on Signal Processing and Intelligent Systems . 2019

机译：零拍摄对视频中的人对象交互识别
5. Visual Recognition and Synthesis of Human-Object Interactions [D] . ?Chao, Yu-Wei 2019

机译：人体对象交互的视觉识别和合成
6. Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning [O] . Vali Ollah Maraghi, Karim Faez 2021

机译：通过零射击学习将人类对象交互识别缩放
7. Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation [O] . Wei Feng, Wentao Liu, Tong Li, 2019

机译：人对象交互识别和人类姿态估计的涡轮增压学习框架

Zero-Shot Learning on Human-Object Interaction Recognition in Video

摘要

著录项

相似文献

相关主题

期刊订阅