首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
【24h】

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

机译:深度卷积网络中的空间金字塔池用于视觉识别

获取原文
获取原文并翻译 | 示例
           

摘要

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this - ompetition.
机译:现有的深度卷积神经网络(CNN)需要固定大小(例如224224)的输入图像。此要求是“人工的”,可能会降低对任意大小/比例的图像或子图像的识别精度。在这项工作中,我们为网络配备了另一种池策略“空间金字塔池”,以消除上述要求。新的网络结构称为SPP-net,可以生成固定长度的表示形式,而与图像大小/比例无关。金字塔合并对于对象变形也很稳定。凭借这些优势,SPP网络通常应改进所有基于CNN的图像分类方法。在ImageNet 2012数据集上,我们证明了SPP-net可以提高各种CNN架构的准确性,尽管它们的设计不同。在Pascal VOC 2007和Caltech101数据集上,SPP-net使用单个完整图像表示而无需微调即可实现最新的分类结果。 SPP-net的功能在目标检测中也很重要。使用SPP-net,我们仅从整个图像计算一次特征图,然后在任意区域(子图像)中合并特征以生成固定长度的表示形式以训练检测器。该方法避免了重复计算卷积特征。在处理测试图像时,我们的方法比R-CNN方法快24-102,同时在Pascal VOC 2007上达到了更好或相当的准确性。在ImageNet大规模视觉识别挑战赛(ILSVRC)2014中,我们的方法在对象检测中排名第二。在38个团队中的图像分类中排名第三。该手稿还介绍了为此所做的改进-省略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号