首页> 外文会议>International Conference on Computer Vision >Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition
【24h】

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

机译:基于多功能加强学习的帧采样,实现有效未经监测视频识别

获取原文

摘要

Video Recognition has drawn great research interest and great progress has been made. A suitable frame sampling strategy can improve the accuracy and efficiency of recognition. However, mainstream solutions generally adopt hand-crafted frame sampling strategies for recognition. It could degrade the performance, especially in untrimmed videos, due to the variation of frame-level saliency. To this end, we concentrate on improving untrimmed video classification via developing a learning-based frame sampling strategy. We intuitively formulate the frame sampling procedure as multiple parallel Markov decision processes, each of which aims at picking out a frame/clip by gradually adjusting an initial sampling. Then we propose to solve the problems with multi-agent reinforcement learning (MARL). Our MARL framework is composed of a novel RNN-based context-aware observation network which jointly models context information among nearby agents and historical states of a specific agent, a policy network which generates the probability distribution over a predefined action space at each step and a classification network for reward calculation as well as final recognition. Extensive experimental results show that our MARL-based scheme remarkably outperforms hand-crafted strategies with various 2D and 3D baseline methods. Our single RGB model achieves a comparable performance of ActivityNet v1.3 champion submission with multi-modal multi-model fusion and new state-of-the-art results on YouTube Birds and YouTube Cars.
机译:视频识别造成了很大的研究兴趣和取得了巨大进展。合适的帧采样策略可以提高识别的准确性和效率。但是,主流解决方案通常采用手工制作的帧采样策略来识别。由于帧级显着性的变化,它可能会降低表现,特别是在未经过时的视频中。为此,我们专注于通过开发基于学习的帧采样策略来改善未经监测的视频分类。我们直观地将帧采样过程作为多个并行马尔可夫决策过程,每个决策过程旨在通过逐步调整初始采样来挑出帧/剪辑。然后我们建议解决多智能经纪增强学习(Marl)的问题。我们的Marl框架由新的基于RNN的上下文感知观察网络组成,该观测网络在附近代理和特定代理的历史状态之间共同模型,该策略网络,该策略网络在每个步骤中生成预定义的动作空间上的概率分布奖励计算的分类网络以及最终识别。广泛的实验结果表明,我们的基于Marl的方案非常优于各种2D和3D基线方法的手工制作的策略。我们的单一RGB模型实现了ActivityNet V1.3冠军提交的可比性性能,在YouTube鸟类和YouTube汽车上具有多模型多模型融合和新的最先进结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号