Transformer-based Cross Reference Network for video salient object detection

Huang K.; Tian C.; Su J.Lin J.C.-W.

首页> 外文期刊>Pattern recognition letters >Transformer-based Cross Reference Network for video salient object detection

【24h】

Transformer-based Cross Reference Network for video salient object detection

机译：Transformer-based Cross Reference Network for video salient object detection

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

? 2022Video salient object detection is a fundamental computer vision task aimed at highlighting the most conspicuous objects in a video sequence. There are two key challenges presented in video salient object detection: (1) how to extract effective feature representations from appearance and motion cues, and (2) how to combine both of them into robust saliency representation. To handle these challenges, in this paper, we propose a novel Transformer-based Cross Reference Network (TCRN), which fully exploits long-range context dependencies in both feature representation extraction and cross-modal (i.e., appearance and motion) integration. In contrast to existing CNN-based methods, our approach formulates video salient object detection as a sequence-to-sequence prediction task. In the proposed approach, the deep feature extraction is achieved by a pure vision transformer with multi-resolution token representations. Specifically, we design a Gated Cross Reference (GCR) module to effectively integrate appearance and motion into saliency representation. The GCR first propagates global context information between different modalities, and then perform cross-modal fusion by a gate mechanism. Extensive evaluations on five widely-used benchmarks show that the proposed Transformer-based method performs favorably against the existing state-of-the-art methods

著录项

来源
《Pattern recognition letters》 |2022年第8期|122-127|共6页
作者
Huang K.; Tian C.; Su J.Lin J.C.-W.;
展开▼
作者单位

College of Information Engineering Shanghai Maritime University;

School of Software Northwestern Polytechnical University;

School of Computer Science and Technology Harbin Institute of Technology ShenzhenDepartment of Computer Science Electrical Engineering and Mathematical Sciences Western Norway University of Applied Sciences;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类
关键词
Cross-modal integration; Transformer; Video salient: Object detection;

Transformer-based Cross Reference Network for video salient object detection

摘要

著录项

相关主题

期刊订阅