首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
【24h】

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

机译:视觉识别和描述的长期递归卷积网络

获取原文
获取原文并翻译 | 示例

摘要

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent are effective for tasks involving sequences, visual and otherwise. We describe a class of recurrent convolutional architectures which is end-to-end trainable and suitable for large-scale visual understanding tasks, and demonstrate the value of these models for activity recognition, image captioning, and video description. In contrast to previous models which assume a fixed visual representation or perform simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they learn compositional representations in space and time. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Differentiable recurrent models are appealing in that they can directly map variable-length inputs (e.g., videos) to variable-length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent sequence models are directly connected to modern visual convolutional network models and can be jointly trained to learn temporal dynamics and convolutional perceptual representations. Our results show that such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined or optimized.
机译:基于深度卷积网络的模型主导了最近的图像解释任务。我们调查了也经常出现的模型是否对涉及序列,视觉和其他方面的任务有效。我们描述了一类递归卷积体系结构,它是端到端可训练的并且适合于大规模的视觉理解任务,并展示了这些模型对于活动识别,图像字幕和视频描述的价值。与先前的模型假定固定的视觉表示或对顺序处理进行简单的时间平均相比,循环卷积模型“加倍深入”,因为它们学习时空的构图表示。当非线性被合并到网络状态更新中时,学习长期依赖性是可能的。可区分的递归模型之所以吸引人,是因为它们可以将可变长度输入(例如视频)直接映射到可变长度输出(例如自然语言文本),并且可以对复杂的时间动态建模;但是可以通过反向传播对其进行优化。我们的循环序列模型直接连接到现代视觉卷积网络模型,可以共同训练以学习时间动态和卷积感知表示。我们的结果表明,与分别定义或优化的识别或生成的最新模型相比,此类模型具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号