...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
【24h】

MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

机译:MAPNET:用于RGB-D室内场景分类的多模态细心池网络

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

RGB-D indoor scene classification is an essential and challenging task. Although convolutional neural network (CNN) achieves excellent results on RGB-D object recognition, it has several limitations when extended towards RGB-D indoor scene classification. 1) The semantic cues such as objects of the indoor scene have high spatial variabilities. The spatially rigid global representation from CNN is suboptimal. 2) The cluttered indoor scene has lots of redundant and noisy semantic cues; thus discerning discriminative information among them should not be ignored. 3) Directly concatenating or summing global RGB and Depth information as presented in popular methods cannot fully exploit the complementarity between two modalities for complicated indoor scenarios. To address the above problems, we propose a novel unified framework named Multi-modal Attentive Pooling Network (MAPNet) in this paper. Two orderless attentive pooling blocks are constructed in MAPNet to aggregate semantic cues within and between modalities meanwhile maintain the spatial invariance. The Intra-modality Attentive Pooling (IAP) block aims to mine and pool discriminative semantic cues in each modality. The Cross-modality Attentive Pooling (CAP) block is extended to learn different contributions across two modalities, which further guides the pooling of the selected discriminative semantic cues of each modality. We further show that the proposed model is interpretable, which helps to understand mechanisms of both scene classification and multi-modal fusion in MAPNet. Extensive experiments and analysis on SUN RGB-D Dataset and NYU Depth Dataset V2 show the superiority of MAPNet over current state-of-the-art methods. (C) 2019 Elsevier Ltd. All rights reserved.
机译:RGB-D室内场景分类是一个必不可少的和具有挑战性的任务。虽然卷积神经网络(CNN)在RGB-D对象识别上实现了出色的结果,但在延伸到RGB-D室内场景分类时,它有几个限制。 1)室内场景的对象等语义提示具有高空间变性。来自CNN的空间刚性全局表示是次优。 2)杂乱的室内场景有很多冗余和嘈杂的语义线索;因此,不应忽略它们之间的辨别性信息。 3)直接连接或求解流行方法中呈现的全局RGB和深度信息,不能完全利用两个模态的复杂室内方案之间的互补性。为了解决上述问题,我们提出了一篇名为Mult-Modal注意力池(MapNet)的新型统一框架。在MapNet中构建了两个无噪声的细节池块,以聚合在模型之间的语义线索和模态之间,同时保持空间不变性。模型内部注意力汇总(IAP)块旨在挖掘和池鉴别的语义线索在每种方式中。延长跨模式注意力汇总(帽)块以学习跨两个模式的不同贡献,这进一步引导了每个模态的所选择的鉴别语义线索的汇集。我们进一步表明,所提出的模型是可解释的,这有助于了解MapNet中场景分类和多模态融合的机制。 SUN RGB-D数据集和NYU深度数据集V2的广泛实验和分析显示了MapNet过时的最先进方法的优越性。 (c)2019年elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号