What If We Do Not have Multiple Videos of the Same Action? — Video Action Localization Using Web Images

机译：如果我们没有多个相同动作的视频怎么办？ —使用Web图像的视频操作本地化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper tackles the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. Action is localized by employing images downloaded from internet using action name. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. We present extensive experiments on trimmed as well as untrimmed datasets to validate the effectiveness of the proposed approach.

机译：本文解决了视频中时空行为本地化的问题，而无需假设多个视频或任何先前的注释的可用性。通过使用使用动作名称从Internet下载的图像来对动作进行本地化。给定网络图像，我们首先使用随机游走来抑制图像噪声，并使用图像动作建议来避免分散图像中的干扰背景。然后，给定视频，我们生成多个时空动作建议。我们通过利用提案中的光流梯度来抑制相机和背景生成的提案。为了获得最具代表性的行动建议，我们建议通过利用图像中的行动建议来重构视频中的行动建议。此外，我们保留了视频的时间平滑度，并使用将每个包围盒的系数推向共同的共识的约束条件，共同重构了所有提议包围盒，从而在多个帧之间实现了系数相似性。我们使用两度投影算法的变体来解决此优化问题。最后，将具有最低重构成本且运动显着的视频建议用于定位动作。我们的方法不仅适用于修剪后的视频，而且还可以用于未修剪视频中的动作定位，这是一个非常具有挑战性的问题。我们在修剪和未修剪的数据集上进行了广泛的实验，以验证所提出方法的有效性。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2016年|1077-1085|共9页
会议地点
作者
Waqas Sultani; Mubarak Shah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Videos; Proposals; Image reconstruction; Noise measurement; Internet; Google; Detectors;

机译：视频;提案;图像重建;噪声测量;互联网; Google;检测器;

相似文献

外文文献
中文文献
专利

1. Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos [J] . Song Y., Zheng Y.-T., Tang S., Circuits and Systems for Video Technology, IEEE Transactions on . 2011,第9期

机译：本地化多核学习，用于视频中逼真的人类动作识别
2. Localizing web videos using social images [J] . Cao Liujuan, Liu Xian-Ming, Liu Wei, Information Sciences: An International Journal . 2015,第Null期

机译：使用社交图像本地化网络视频
3. Temporal Action Localization in Untrimmed Videos Using Action Pattern Trees [J] . Song Hao, Wu Xinxiao, Zhu Bing, IEEE transactions on multimedia . 2019,第3期

机译：使用动作模式树在未修剪视频中进行时间动作本地化
4. What if we do not have multiple videos of the same action? - Video Action Localization Using Web Images [C] . Waqas Sultani, Mubarak Shah IEEE Conference on Computer Vision and Pattern Recognition . 2016

机译：如果我们没有多个相同行动的视频怎么办？ - 使用Web Images的视频操作本地化
5. Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions [D] . Xiang, Xiang. 2018

机译：用于识别，定位和量化动作的视频的图像集，时间和时空表示
6. Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction [O] . Dat Tien Nguyen, Ki Wan Kim, Hyung Gil Hong, 2017

机译：基于卷积神经网络的可见光和热成像摄像机视频对人体图像的性别识别
7. Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images [O] . Chen Sun, Sanketh Shetty, Rahul Sukthankar, 2015

机译：通过Web图像域转移实现视频中细粒度动作的时间局部化

What If We Do Not have Multiple Videos of the Same Action? — Video Action Localization Using Web Images

摘要

著录项

相似文献

相关主题

期刊订阅