Exploring the effects of non-local blocks on video captioning networks

Jaeyoung Lee; Junmo Kim

首页> 外文期刊>International journal of computational vision and robotics >Exploring the effects of non-local blocks on video captioning networks

【24h】

Exploring the effects of non-local blocks on video captioning networks

机译：探索非本地块对视频字幕网络的影响

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In addition to visual features, video also contains temporal information that contributes to semantic meaning regarding the relationships between objects and scenes. There have been many attempts to describe spatial and temporal relationships in video, but simple encoder-decoder models are not sufficient for capturing detailed relationships in video clips. A video clip often consists of several shots that seem to be unrelated, and simple recurrent models suffer from these changes in shots. In other fields, including visual question answering and action recognition, researchers began to have interests in describing visual relations between the objects. In this paper, we introduce a video captioning method to capture temporal relationships with a non-local block and boundary-aware system. We evaluate our approach on a Microsoft video description Corpus (MSVD, YouTube2Text) dataset and a Microsoft research-video to text (MSR-VTT) dataset. The experimental results show that a non-local block applied along a temporal axis can improve video captioning performance on video captioning datasets.

机译：除了视觉功能外，视频还包含时间信息，这些时间信息有助于实现有关对象和场景之间关系的语义含义。已经进行了许多尝试来描述视频中的空间和时间关系，但是简单的编码器-解码器模型不足以捕获视频剪辑中的详细关系。视频剪辑通常包含几张看起来无关的镜头，并且简单的循环模型会遭受这些镜头变化的影响。在其他领域，包括视觉问题解答和动作识别，研究人员开始对描述对象之间的视觉关系感兴趣。在本文中，我们介绍了一种视频字幕方法来捕获与非局部块和边界感知系统的时间关系。我们在Microsoft视频描述语料库（MSVD，YouTube2Text）数据集和Microsoft研究视频到文本（MSR-VTT）数据集上评估我们的方法。实验结果表明，沿时间轴应用的非局部块可以提高视频字幕数据集上的视频字幕性能。

著录项

来源
《International journal of computational vision and robotics 》 |2019年第5期| 502-514| 共13页
作者
Jaeyoung Lee; Junmo Kim;
展开▼
作者单位

School of Electrical Engineering Korea Advanced Institute of Science and Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
video captioning; non-local mean; self-attention; video description;

机译：视频字幕;非本地均值自我关注;影片说明;

相似文献

外文文献
中文文献
专利

1. Effects of modality preference and working memory capacity on captioned videos in enhancing L2 listening outcomes [J] . Kam Emily Fen, Liu Yeu-Ting, Tseng Wen-Ta ReCall . 2020 ,第PTa2期

机译：模态偏好和工作存储器容量对增强L2侦听结果中的标题视频的影响
2. Exploring diverse and fine-grained caption for video by incorporating convolutional architecture into LSTM-based model [J] . Pattern recognition letters . 2020 ,第Jana期

机译：通过将卷积体系结构整合到基于LSTM的模型中，探索视频的各种细粒度字幕
3. Multimodal architecture for video captioning with memory networks and an attention mechanism [J] . Li Wei, Guo Dashan, Fang Xiangzhong Pattern recognition letters . 2018 ,第APRa1期

机译：具有存储网络的视频字幕多模式体系结构和一种注意机制
4. Improving Video Captioning with Non-Local Neural Networks [C] . Jaeyoung Lee, Junmo Kim IEEE International Conference on Consumer Electronics - Asia . 2018

机译：使用非本地神经网络改善视频字幕
5. Automatic Video Captioning using Deep Neural Network. [D] . Nguyen, Thang Huy. 2017

机译：使用深度神经网络的自动视频字幕。
6. Exploring convolutional neural networks and spatial video for on-the-ground mapping in informal settlements [O] . Jayakrishnan Ajayakumar, Andrew J. Curtis, Vanessa Rouzier, 2021

机译：在非正式定居点中探索卷积神经网络和地面映射的空间视频
7. THE EFFECTS OF CAPTIONED VIDEOS ON PRIMARY ESL LEARNERS’ VOCABULARY ACQUISITION IN A MALAYSIAN RURAL SETTING [O] . Azman Hariffin, Nur Ehsan Mohd Said 2019

机译：标题视频对马来西亚农村环境中初级ESL学习者词汇征收的影响
8. Effects of Video Weather Training Products, Web-Based Preflight Weather Briefing, and Local Versus Non-Local Pilots on General Aviation Pilot Weather Knowledge and Flight Behavior. Phase 2 [R] . Knecht, W., Ball, J., Lenz, M. 2010

机译：视频气象训练产品，基于网络的预检天气简报，本地与非本地飞行员对通用航空飞行员气象知识和飞行行为的影响。阶段2

Exploring the effects of non-local blocks on video captioning networks

摘要

著录项

相似文献

相关主题

期刊订阅