...
首页> 外文期刊>Autonomous robots >Viewpoint invariant semantic object and scene categorization with RGB-D sensors
【24h】

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

机译:ViewPoint不变语义对象和RGB-D传感器的场景分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin.
机译:了解使用多模态RGB-D传感器的对象和场景的语义提供许多机器人应用程序。准确的RGB-D图像识别的关键挑战是训练数据的稀缺性,视点变化和数据的异构性质导致的变化。我们解决了这些问题并提出了一种基于预先训练的卷积神经网络的通用深层学习框架,作为颜色和深度通道的特征提取器。我们提出了丰富的多尺度特征表示,称为卷积超立体金字塔(HP-CNN),其能够以不同的细节水平从卷积张量编码辨别信息。我们还提出了一种技术来利用基于后期融合方案的极端学习机分类器来熔化所提出的HP-CNN的技术,其在后期融合方案中导致高度辨别和紧凑的表示。为了进一步提高性能,我们设计了从多视图3D对象姿势(M3DOP)模型中提取的视图 - CNN-T. M3DOP从超过140,000个RGB-D图像中学到的,通过从不同的视点渲染CAD模型来综合生成。四个RGB-D对象和场景识别数据集的广泛评估表明,我们的HP-CNN和HP-CNN-T一致地优于最先进的方法,以通过显着的余量来实现若干识别任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号