MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

Li Yabei; Zhang Zhang; Cheng Yanhua; Wang Liang; Tan Tieniu

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

【24h】

MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

机译：MAPNET：用于RGB-D室内场景分类的多模态细心池网络

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

RGB-D indoor scene classification is an essential and challenging task. Although convolutional neural network (CNN) achieves excellent results on RGB-D object recognition, it has several limitations when extended towards RGB-D indoor scene classification. 1) The semantic cues such as objects of the indoor scene have high spatial variabilities. The spatially rigid global representation from CNN is suboptimal. 2) The cluttered indoor scene has lots of redundant and noisy semantic cues; thus discerning discriminative information among them should not be ignored. 3) Directly concatenating or summing global RGB and Depth information as presented in popular methods cannot fully exploit the complementarity between two modalities for complicated indoor scenarios. To address the above problems, we propose a novel unified framework named Multi-modal Attentive Pooling Network (MAPNet) in this paper. Two orderless attentive pooling blocks are constructed in MAPNet to aggregate semantic cues within and between modalities meanwhile maintain the spatial invariance. The Intra-modality Attentive Pooling (IAP) block aims to mine and pool discriminative semantic cues in each modality. The Cross-modality Attentive Pooling (CAP) block is extended to learn different contributions across two modalities, which further guides the pooling of the selected discriminative semantic cues of each modality. We further show that the proposed model is interpretable, which helps to understand mechanisms of both scene classification and multi-modal fusion in MAPNet. Extensive experiments and analysis on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of MAPNet over current state-of-the-art methods. (C) 2019 Elsevier Ltd. All rights reserved.

机译：RGB-D室内场景分类是一个必不可少的和具有挑战性的任务。虽然卷积神经网络（CNN）在RGB-D对象识别上实现了出色的结果，但在延伸到RGB-D室内场景分类时，它有几个限制。 1）室内场景的对象等语义提示具有高空间变性。来自CNN的空间刚性全局表示是次优。 2）杂乱的室内场景有很多冗余和嘈杂的语义线索;因此，不应忽略它们之间的辨别性信息。 3）直接连接或求解流行方法中呈现的全局RGB和深度信息，不能完全利用两个模态的复杂室内方案之间的互补性。为了解决上述问题，我们提出了一篇名为Mult-Modal注意力池（MapNet）的新型统一框架。在MapNet中构建了两个无噪声的细节池块，以聚合在模型之间的语义线索和模态之间，同时保持空间不变性。模型内部注意力汇总（IAP）块旨在挖掘和池鉴别的语义线索在每种方式中。延长跨模式注意力汇总（帽）块以学习跨两个模式的不同贡献，这进一步引导了每个模态的所选择的鉴别语义线索的汇集。我们进一步表明，所提出的模型是可解释的，这有助于了解MapNet中场景分类和多模态融合的机制。 SUN RGB-D数据集和NYU深度数据集V2的广泛实验和分析显示了MapNet过时的最先进方法的优越性。（c）2019年elestvier有限公司保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2019年第2019期|共14页
作者
Li Yabei; Zhang Zhang; Cheng Yanhua; Wang Liang; Tan Tieniu;
展开▼
作者单位

CASIA CRIPAC Beijing Peoples R China;

CASIA CRIPAC Beijing Peoples R China;

Tencent WeChat AI Beijing Peoples R China;

CASIA CRIPAC Beijing Peoples R China;

CASIA CRIPAC Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Indoor scene classification; Multi-modal fusion; RGB-D; Attentive pooling;

机译：室内场景分类;多模态融合;RGB-D;细心汇集;

相似文献

外文文献
中文文献
专利

1. MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification (vol 90, pg 436, 2019) [J] . Li Yabei, Zhang Zhang, Cheng Yanhua, Pattern Recognition: The Journal of the Pattern Recognition Society . 2019,第期

机译：MAPNET：用于RGB-D室内场景分类的多模态细心汇集网络（VOL 90，PG 436,2019）
2. High-Order Generalized Orderless Pooling Networks for Synthetic-Aperture Radar Scene Classification [J] . Ni Kang, Wang Peng, Wu Yiquan IEEE Geoscience and Remote Sensing Letters . 2019,第11期

机译：合成孔径雷达场景分类的高阶广义无序合并网络
3. Indoor versus Outdoor Scene Classification Using Probabilistic Neural Network [J] . Lalit Gupta, Vinod Pathangay, Arpita Patra, EURASIP journal on advances in signal processing . 2006,第1期

机译：基于概率神经网络的室内与室外场景分类
4. DF~2Net: A Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification [C] . Yabei Li, Junge Zhang, Yanhua Cheng, AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence . 2018

机译：DF〜2NET：RGB-D室内场景分类的鉴别特征学习和融合网络
5. Real-Time Capture and Rendering of Physical Scene with an Efficiently Calibrated RGB-D Camera Network [D] . Su, Po-Chang. 2017

机译：通过高效校准的RGB-D摄像机网络实时捕获和渲染物理场景
6. Saliency-Guided Detection of Unknown Objects in RGB-D Indoor Scenes [O] . Jiatong Bao, Yunyi Jia, Yu Cheng, 2015

机译：RGB-D室内场景中未知对象的显着性引导检测
7. SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS [O] . Z. Wang, T. Li, L. Pan, 2017

机译：使用编码 - 解码器完全卷积网络从室内RGB-D图像中的场景语义分割

MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

摘要

著录项

相似文献

相关主题

期刊订阅