Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

He Kaiming; Zhang Xiangyu; Ren Shaoqing; Sun Jian

首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

【24h】

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

机译：深度卷积网络中的空间金字塔池用于视觉识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this - ompetition.

机译：现有的深度卷积神经网络（CNN）需要固定大小（例如224224）的输入图像。此要求是“人工的”，可能会降低对任意大小/比例的图像或子图像的识别精度。在这项工作中，我们为网络配备了另一种池策略“空间金字塔池”，以消除上述要求。新的网络结构称为SPP-net，可以生成固定长度的表示形式，而与图像大小/比例无关。金字塔合并对于对象变形也很稳定。凭借这些优势，SPP网络通常应改进所有基于CNN的图像分类方法。在ImageNet 2012数据集上，我们证明了SPP-net可以提高各种CNN架构的准确性，尽管它们的设计不同。在Pascal VOC 2007和Caltech101数据集上，SPP-net使用单个完整图像表示而无需微调即可实现最新的分类结果。 SPP-net的功能在目标检测中也很重要。使用SPP-net，我们仅从整个图像计算一次特征图，然后在任意区域（子图像）中合并特征以生成固定长度的表示形式以训练检测器。该方法避免了重复计算卷积特征。在处理测试图像时，我们的方法比R-CNN方法快24-102，同时在Pascal VOC 2007上达到了更好或相当的准确性。在ImageNet大规模视觉识别挑战赛（ILSVRC）2014中，我们的方法在对象检测中排名第二。在38个团队中的图像分类中排名第三。该手稿还介绍了为此所做的改进-省略。

著录项

来源
《Pattern Analysis and Machine Intelligence, IEEE Transactions on》 |2015年第9期|1904-1916|共13页
作者
He Kaiming; Zhang Xiangyu; Ren Shaoqing; Sun Jian;
展开▼
作者单位

Visual Computing Group, Microsoft Research, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional Neural Networks; Convolutional neural networks; Image Classification; Object Detection; Spatial Pyramid Pooling; image classification; object detection; spatial pyramid pooling;

机译：卷积神经网络;卷积神经网络;图像分类;目标检测;空间金字塔池;图像分类;目标检测;空间金字塔池;

相似文献

外文文献
中文文献
专利

1. Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks [J] . Wanli Yang, Yimin Chen, Chen Huang, Future Internet . 2018,第12期

机译：使用空间金字塔池和3D密集卷积网络的基于视频的人类动作识别
2. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks [J] . Qu Tao, Zhang Quanyuan, Sun Shilei Multimedia Tools and Applications . 2017,第20期

机译：使用基于空间金字塔池的深度卷积神经网络从高分辨率航空图像中进行车辆检测
3. Compact Spatial Pyramid Pooling Deep Convolutional Neural Network Based Hand Gestures Decoder [J] . Akm Ashiquzzaman, Hyunmin Lee, Kwangki Kim, Applied Sciences . 2020,第21期

机译：紧凑空间金字塔池深卷积神经网络的手势解码器
4. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [C] . Kaiming He, Xiangyu Zhang, Shaoqing Ren, European conference on computer vision . 2014

机译：深度卷积网络中的空间金字塔池用于视觉识别
5. Hyperparameter Optimization of Deep Convolutional Neural Networks Architectures for Object Recognition [D] . Albelwi, Saleh. 2018

机译：深度卷积神经网络体系结构用于对象识别的超参数优化
6. Group and Shuffle Convolutional Neural Networks with Pyramid Pooling Module for Automated Pterygium Segmentation [O] . Siti Raihanah Abdani, Mohd Asyraf Zulkifley, Nuraisyah Hani Zulkifley 2021

机译：组和随机卷积神经网络采用金字塔汇集模块用于自动翼状胬肉细分
7. Spatial pyramid pooling in deep convolutional networks for visual recognition [O] . Kaiming He, Xiangyu Zhang, Shaoqing Ren, 2016

机译：用于视觉识别的深度卷积网络中的空间金字塔池

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

摘要

著录项

相似文献

相关主题

期刊订阅