Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

机译：基于多功能加强学习的帧采样，实现有效未经监测视频识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Video Recognition has drawn great research interest and great progress has been made. A suitable frame sampling strategy can improve the accuracy and efficiency of recognition. However, mainstream solutions generally adopt hand-crafted frame sampling strategies for recognition. It could degrade the performance, especially in untrimmed videos, due to the variation of frame-level saliency. To this end, we concentrate on improving untrimmed video classification via developing a learning-based frame sampling strategy. We intuitively formulate the frame sampling procedure as multiple parallel Markov decision processes, each of which aims at picking out a frame/clip by gradually adjusting an initial sampling. Then we propose to solve the problems with multi-agent reinforcement learning (MARL). Our MARL framework is composed of a novel RNN-based context-aware observation network which jointly models context information among nearby agents and historical states of a specific agent, a policy network which generates the probability distribution over a predefined action space at each step and a classification network for reward calculation as well as final recognition. Extensive experimental results show that our MARL-based scheme remarkably outperforms hand-crafted strategies with various 2D and 3D baseline methods. Our single RGB model achieves a comparable performance of ActivityNet v1.3 champion submission with multi-modal multi-model fusion and new state-of-the-art results on YouTube Birds and YouTube Cars.

机译：视频识别造成了很大的研究兴趣和取得了巨大进展。合适的帧采样策略可以提高识别的准确性和效率。但是，主流解决方案通常采用手工制作的帧采样策略来识别。由于帧级显着性的变化，它可能会降低表现，特别是在未经过时的视频中。为此，我们专注于通过开发基于学习的帧采样策略来改善未经监测的视频分类。我们直观地将帧采样过程作为多个并行马尔可夫决策过程，每个决策过程旨在通过逐步调整初始采样来挑出帧/剪辑。然后我们建议解决多智能经纪增强学习（Marl）的问题。我们的Marl框架由新的基于RNN的上下文感知观察网络组成，该观测网络在附近代理和特定代理的历史状态之间共同模型，该策略网络，该策略网络在每个步骤中生成预定义的动作空间上的概率分布奖励计算的分类网络以及最终识别。广泛的实验结果表明，我们的基于Marl的方案非常优于各种2D和3D基线方法的手工制作的策略。我们的单一RGB模型实现了ActivityNet V1.3冠军提交的可比性性能，在YouTube鸟类和YouTube汽车上具有多模型多模型融合和新的最先进结果。

著录项

来源
《International Conference on Computer Vision》|2019年|1 v.|共10页
会议地点
作者
Wenhao Wu; Dongliang He; Xiao Tan; Shifeng Chen; Shilei Wen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词
Learning (artificial intelligence); YouTube; Markov processes; Two dimensional displays; Three-dimensional displays; Decision making; Computer vision;

机译：学习（人工智能）的YouTube;马尔可夫过程;二维显示;三维显示;决策;计算机视觉;

相似文献

外文文献
中文文献
专利

1. Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition [J] . Min Fang, Frans C.A. Groen, Hao Li, Engineering Applications of Artificial Intelligence . 2014,第jana期

机译：基于具有动态分区的新型协调树框架的协同多主体强化学习
2. Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation [J] . Le Wang, Xuhuan Duan, Qilin Zhang, Sensors . 2018,第5期

机译：Segment-Tube：具有按帧分割的未修剪视频中的时空行为本地化
3. Video-based face recognition and image synthesis from rotating head frames using nonlinear manifold learning by neural networks [J] . Hamedani Kian, Seyyedsalehi Seyyed Ali, Ahamdi Reza Neural computing & applications . 2016,第6期

机译：使用神经网络的非线性流形学习从旋转的头部框架中进行基于视频的面部识别和图像合成
4. Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition [C] . Wenhao Wu, Dongliang He, Xiao Tan, International Conference on Computer Vision . 2019

机译：基于多智能体强化学习的帧采样可实现有效的未修剪视频识别
5. Model-Based Reinforcement Learning for Cooperative Multi-Agent Planning: Exploiting Hierarchies, Bias, and Temporal Sampling [D] . Ma, Aaron. 2020

机译：基于模型的合作多智能经纪人规划的强化学习：利用层次结构，偏见和时间采样
6. Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks [O] . Ki-Sun Lee, Eunyoung Lee, Bareun Choi, 2021

机译：利用深度卷积神经网络的转移学习自动咽期吞咽研究中的自动咽部识别研究
7. Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition [O] . Wenhao Wu, Dongliang He, Xiao Tan, 2019

机译：基于多功能加强学习的帧采样，实现有效未经监测视频识别

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

摘要

著录项

相似文献

相关主题

期刊订阅