Objects for detection usually have distinct characteristics in different sub-regions and different aspect ratios. However, in prevalent two-stage object detection methods, Region-of-Interest (RoI) features are extracted by RoI pooling with little emphasis on these translation-variant feature components. We present feature selective networks to reform the feature representations of RoIs by exploiting their disparities among sub-regions and aspect ratios. Our network produces the sub-region attention bank and aspect ratio attention bank for the whole image. The RoI-based sub-region attention map and aspect ratio attention map are selectively pooled from the banks, and then used to refine the original RoI features for RoI classification. Equipped with a lightweight detection subnetwork, our network gets a consistent boost in detection performance based on general ConvNet backbones (ResNet-101, GoogLeNet and VGG-16). Without bells and whistles, our detectors equipped with ResNet-101 achieve more than 3% mAP improvement compared to counterparts on PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO datasets.
机译:用于检测的对象通常在不同的子区域和不同的纵横比中具有不同的特征。但是,在流行的两阶段对象检测方法中,兴趣区(RoI)特征是通过RoI池提取的,而很少关注这些平移变量特征分量。我们提出了特征选择网络,以通过利用RoI的子区域和纵横比之间的差异来改革RoI的特征表示。我们的网络会为整个图像生成次区域注意力库和长宽比注意力库。从银行中有选择地汇集了基于RoI的子区域注意图和长宽比注意图,然后用于完善原始RoI功能以进行RoI分类。配备轻量级的检测子网,我们的网络基于通用的ConvNet主干网(ResNet-101,GoogLeNet和VGG-16)在检测性能方面得到了持续提高。与PASCAL VOC 2007,PASCAL VOC 2012和MS COCO数据集上的检测器相比,配备ResNet-101的检测器在没有铃音的情况下可将mAP改善3%以上。



