首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Long-term recurrent convolutional networks for visual recognition and description
【24h】

Long-term recurrent convolutional networks for visual recognition and description

机译:用于视觉识别和描述的长期经常性卷积网络

获取原文

摘要

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
机译:基于深度卷积网络的模型占最近的图像解释任务;我们调查是否还经常发生的模型或“时间深度”,对于涉及序列,视觉和其他方面的任务是有效的。我们开发了一种新的经常性卷积架构,适用于大规模的视觉学习,该卷积是最终的可训练,并展示这些模型在基准视频识别任务,图像描述和检索问题以及视频叙述挑战上的价值。与假设固定的时空接收场或顺序处理的简单时间平均值相反,复发卷积模型是“双层”的,因为它们可以是空间和时间“层”的组成。当目标概念复杂和/或训练数据有限时,这种模型可能具有优势。当非线性纳入网络状态更新时,可以学习长期依赖性。长期RNN模型的吸引力在于它们直接可以将可变长度输入(例如,视频帧)映射到可变长度输出(例如,自然语言文本),并且可以模拟复杂的时间动态;然而,它们可以用反向化优化。我们的经常性长期模型直接连接到现代Visual Convnet型号,可以共同培训,以同时学习时间动态和卷积感知表示。我们的结果表明,这些模型与最先进的模型具有明显的优势,用于识别或发电,这些模型是单独定义和/或优化的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号