...
首页> 外文期刊>International journal of computational vision and robotics >Exploring the effects of non-local blocks on video captioning networks
【24h】

Exploring the effects of non-local blocks on video captioning networks

机译:探索非本地块对视频字幕网络的影响

获取原文
获取原文并翻译 | 示例

摘要

In addition to visual features, video also contains temporal information that contributes to semantic meaning regarding the relationships between objects and scenes. There have been many attempts to describe spatial and temporal relationships in video, but simple encoder-decoder models are not sufficient for capturing detailed relationships in video clips. A video clip often consists of several shots that seem to be unrelated, and simple recurrent models suffer from these changes in shots. In other fields, including visual question answering and action recognition, researchers began to have interests in describing visual relations between the objects. In this paper, we introduce a video captioning method to capture temporal relationships with a non-local block and boundary-aware system. We evaluate our approach on a Microsoft video description Corpus (MSVD, YouTube2Text) dataset and a Microsoft research-video to text (MSR-VTT) dataset. The experimental results show that a non-local block applied along a temporal axis can improve video captioning performance on video captioning datasets.
机译:除了视觉功能外,视频还包含时间信息,这些时间信息有助于实现有关对象和场景之间关系的语义含义。已经进行了许多尝试来描述视频中的空间和时间关系,但是简单的编码器-解码器模型不足以捕获视频剪辑中的详细关系。视频剪辑通常包含几张看起来无关的镜头,并且简单的循环模型会遭受这些镜头变化的影响。在其他领域,包括视觉问题解答和动作识别,研究人员开始对描述对象之间的视觉关系感兴趣。在本文中,我们介绍了一种视频字幕方法来捕获与非局部块和边界感知系统的时间关系。我们在Microsoft视频描述语料库(MSVD,YouTube2Text)数据集和Microsoft研究视频到文本(MSR-VTT)数据集上评估我们的方法。实验结果表明,沿时间轴应用的非局部块可以提高视频字幕数据集上的视频字幕性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号