A cross-modal crowd counting method combining CNN and cross-modal transformer

Zhang Shihui; Wang Wei; Zhao WeiboWang LeiLi Qunpeng

首页> 外文期刊>Image and vision computing >A cross-modal crowd counting method combining CNN and cross-modal transformer

【24h】

A cross-modal crowd counting method combining CNN and cross-modal transformer

机译：A cross-modal crowd counting method combining CNN and cross-modal transformer

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Cross-modal crowd counting aims to use the information between different modalities to generate crowd density images, so as to estimate the number of pedestrians more accurately in unconstrained scenes. Due to the huge differences between different modal images, how to effectively fuse the information between different modali-ties is still a challenging problem. To address this problem, we propose a cross-modal crowd counting method based on CNN and novel cross-modal transformer, which effectively fuses the information between different mo-dalities and boosts the accuracy of crowd counting in unconstrained scenes. Concretely, we first design double CNN branches to capture the modality-specific features of images. After that, we design a novel cross-modal transformer to extract cross-modal global features from the modality-specific features. Furthermore, we a pro-pose cross layer connection structure to connect the front-end information and back-end information of the net-work by adding different layer features. At the end of the network, we develop a cross-modal attention module to strengthen the cross-modal feature representation by extracting the complementarities between different modal features. The experimental results show that the method combining CNN and novel cross-modal trans-former proposed in this paper achieves state-of-the-art performance, which not only effectively improves the accuracy and robustness of cross-modal crowd counting, but also has good generalization under multimodal crowd counting.(c) 2022 Elsevier B.V. All rights reserved.

著录项

来源
《Image and vision computing》 |2023年第1期|1-7|共7页
作者
Zhang Shihui; Wang Wei; Zhao WeiboWang LeiLi Qunpeng;
展开▼
作者单位

Yanshan Univ||Key Lab Comp Virtual Technol & Syst Integrat Hebei;

Yanshan Univ;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类
关键词
Cross -modal crowd counting; CNN; Transformer; Cross layer connection structure; Cross -modal attention module;

A cross-modal crowd counting method combining CNN and cross-modal transformer

摘要

著录项

相关主题

期刊订阅