首页> 外文期刊>Pattern recognition letters >Transformer-based Cross Reference Network for video salient object detection
【24h】

Transformer-based Cross Reference Network for video salient object detection

机译:Transformer-based Cross Reference Network for video salient object detection

获取原文
获取原文并翻译 | 示例
       

摘要

? 2022Video salient object detection is a fundamental computer vision task aimed at highlighting the most conspicuous objects in a video sequence. There are two key challenges presented in video salient object detection: (1) how to extract effective feature representations from appearance and motion cues, and (2) how to combine both of them into robust saliency representation. To handle these challenges, in this paper, we propose a novel Transformer-based Cross Reference Network (TCRN), which fully exploits long-range context dependencies in both feature representation extraction and cross-modal (i.e., appearance and motion) integration. In contrast to existing CNN-based methods, our approach formulates video salient object detection as a sequence-to-sequence prediction task. In the proposed approach, the deep feature extraction is achieved by a pure vision transformer with multi-resolution token representations. Specifically, we design a Gated Cross Reference (GCR) module to effectively integrate appearance and motion into saliency representation. The GCR first propagates global context information between different modalities, and then perform cross-modal fusion by a gate mechanism. Extensive evaluations on five widely-used benchmarks show that the proposed Transformer-based method performs favorably against the existing state-of-the-art methods

著录项

  • 来源
    《Pattern recognition letters》 |2022年第8期|122-127|共6页
  • 作者单位

    College of Information Engineering Shanghai Maritime University;

    School of Software Northwestern Polytechnical University;

    School of Computer Science and Technology Harbin Institute of Technology ShenzhenDepartment of Computer Science Electrical Engineering and Mathematical Sciences Western Norway University of Applied Sciences;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 英语
  • 中图分类
  • 关键词

    Cross-modal integration; Transformer; Video salient: Object detection;

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号