Unsupervised Learning of Video Representations via Dense Trajectory Clustering

机译：通过密集的轨迹聚类无监督学习视频表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the task of unsupervised learning of representations for action recognition in videos. Previous works proposed to utilize future prediction, or other domain-specific objectives to train a network, but achieved only limited success. In contrast, in the relevant field of image representation learning, simpler, discrimination-based methods have recently bridged the gap to fully-supervised performance. We first propose to adapt two top performing objectives in this class -instance recognition and local aggregation, to the video domain. In particular, the latter approach iterates between clustering the videos in the feature space of a network and updating it to respect the cluster with a non-parametric classification loss. We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns, grouping the videos based on appearance. To mitigate this issue, we turn to the heuristic-based IDT descriptors, that were manually designed to encode motion patterns in videos. We form the clusters in the IDT space, using these descriptors as a an unsupervised prior in the iterative local aggregation algorithm. Our experiments demonstrates that this approach outperform prior work on UCF101 and HMDB51 action recognition benchmarks. We also qualitatively analyze the learned representations and show that they successfully capture video dynamics.

机译：本文涉及无监督学习的视频在视频中的行动认可陈述的任务。以前的作品建议利用未来的预测，或其他具体的具体目标培训网络，但只取得了有限的成功。相比之下，在图像表示学习的相关领域，更简单，基于鉴别的方法最近遍历了完全监督的性能。我们首先建议在这个类 - 最高识别和本地聚合中调整两个顶级的表演目标，以及视频域。特别地，后一种方法迭代在网络的特征空间中的视频之间迭代并更新其以尊重与非参数分类损失的群集。我们观察有希望的性能，但定性分析表明，所学习的表示未能捕获运动模式，根据外观对视频进行分组。为了缓解此问题，我们转向基于启发式的IDT描述符，该描述是手动设计用于编码视频中的运动模式。我们在IDT空间中形成群集，使用这些描述符作为迭代本地聚合算法中的一个无人监督。我们的实验表明，这种方法在UCF101和HMDB51动作识别基准上占此胜过。我们还定性地分析了学习的表示，并表明他们成功捕获了视频动态。

著录项

来源
《European conference on computer vision》|2020年|404-421|共18页
会议地点
作者
Pavel Tokmakov; Martial Hebert; Cordelia Schmid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Unsupervised representation learning; Action recognition;

机译：无监督的代表学习;行动认可;

相似文献

外文文献
中文文献
专利

1. Estimating mechanical properties of cloth from videos using dense motion trajectories: Human psychophysics and machine learning [J] . Wenyan Bi, Peiran Jin, Hendrikje Nienborg, Journal of vision . 2018,第5期

机译：使用密集运动轨迹从视频估计布料的机械性能：人类心理物理学和机器学习
2. Bottom-up unsupervised image segmentation using FC-Dense u-net based deep representation clustering and multidimensional feature fusion based region merging [J] . Image and Vision Computing . 2020,第Feba期

机译：使用基于FC-Dense u-net的深度表示聚类和基于多维特征融合的区域合并进行自下而上的无监督图像分割
3. Video trajectory analysis using unsupervised clustering and multi-criteria ranking [J] . Sekh Arif Ahmed, Dogra Debi Prosad, Kar Samarjit, Soft computing: A fusion of foundations, methodologies and applications . 2020,第21期

机译：使用无监督聚类和多标准排名的视频轨迹分析
4. Fusion of learned multi-modal representations and dense trajectories for emotional analysis in videos [C] . Acar Esra, Hopfgartner Frank, Albayrak Sahin International Workshop on Content-Based Multimedia Indexing . 2015

机译：融合学习到的多模式表示和密集的轨迹进行视频中的情感分析
5. Unsupervised Data Driven Machine Learning in Hyperspectral Imaging and Echocardiography Videos [D] . Shahid, Kazi Tanzeem. 2021

机译：超高光谱成像和超声心动图的无监督数据驱动机器学习
6. Dense Trajectories and DHOG for Classification of Viewpoints from Echocardiogram Videos [O] . Liqin Huang, Xiangyu Zhang, Wei Li 2016

机译：超声心动图视频的密集轨迹和DHOG用于视点分类
7. Unsupervised Learning of Video Representations via Dense Trajectory Clustering [O] . Pavel Tokmakov, Martial Hebert, Cordelia Schmid 2020

机译：通过密集的轨迹聚类无监督学习视频表示

Unsupervised Learning of Video Representations via Dense Trajectory Clustering

摘要

著录项

相似文献

相关主题

期刊订阅