Long-term recurrent convolutional networks for visual recognition and description

机译：用于视觉识别和描述的长期经常性卷积网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

机译：基于深度卷积网络的模型占最近的图像解释任务;我们调查是否还经常发生的模型或“时间深度”，对于涉及序列，视觉和其他方面的任务是有效的。我们开发了一种新的经常性卷积架构，适用于大规模的视觉学习，该卷积是最终的可训练，并展示这些模型在基准视频识别任务，图像描述和检索问题以及视频叙述挑战上的价值。与假设固定的时空接收场或顺序处理的简单时间平均值相反，复发卷积模型是“双层”的，因为它们可以是空间和时间“层”的组成。当目标概念复杂和/或训练数据有限时，这种模型可能具有优势。当非线性纳入网络状态更新时，可以学习长期依赖性。长期RNN模型的吸引力在于它们直接可以将可变长度输入（例如，视频帧）映射到可变长度输出（例如，自然语言文本），并且可以模拟复杂的时间动态;然而，它们可以用反向化优化。我们的经常性长期模型直接连接到现代Visual Convnet型号，可以共同培训，以同时学习时间动态和卷积感知表示。我们的结果表明，这些模型与最先进的模型具有明显的优势，用于识别或发电，这些模型是单独定义和/或优化的。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2015年||共10页
会议地点
作者
Donahue Jeff; Hendricks Lisa Anne; Guadarrama Sergio; Rohrbach Marcus; Venugopalan Subhashini; Darrell Trevor; Saenko Kate;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词

相似文献

外文文献
中文文献
专利

1. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J] . Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2017,第4期

机译：视觉识别和描述的长期递归卷积网络
2. 3D long-term recurrent convolutional networks for human sub-assembly recognition in human-robot collaboration [J] . Xianhe Wen, Heping Chen Assembly Automation . 2020,第4期

机译：3D人体机器人协作中的人类子组装识别的长期经常性卷积网络
3. ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition [J] . Xin Miao, Zhang Hong, Wang Helong, Neurocomputing . 2016,第Feba20期

机译：ARCH：用于长期动作识别的自适应递归卷积混合网络
4. Long-term recurrent convolutional networks for visual recognition and description [C] . Donahue Jeff, Hendricks Lisa Anne, Guadarrama Sergio, IEEE Conference on Computer Vision and Pattern Recognition . 2015

机译：用于视觉识别和描述的长期经常性卷积网络
5. Cortex-inspired goal-directed recurrent networks for developmental visual attention and recognition with complex backgrounds [D] . Luciw, Matthew 2010

机译：皮质启发式目标导向的递归网络，用于在复杂背景下发展视觉注意力和识别
6. ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition [O] . Miao Xin, Hong Zhang, Helong Wang, -1

机译：ARCH：用于长期动作识别的自适应递归卷积混合网络
7. Long-term Recurrent Convolutional Networks for Visual Recognition and Description [O] . Donahue, Jeff, Hendricks, Lisa Anne, Rohrbach, Marcus, 2016

机译：用于视觉识别和识别的长期递归卷积网络描述
8. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. [R] . Donahue, J., Hendricks, L. A., Guadarrama, S., 2014

机译：用于视觉识别和描述的长期循环卷积网络。

Long-term recurrent convolutional networks for visual recognition and description

摘要

著录项

相似文献

相关主题

期刊订阅