A key challenge in generic object detection is being to handle largevariations in object scale, poses, viewpoints, especially part deformationswhen determining the location for specified object categories. Recent advancesin deep neural networks have achieved promising results for object detection byextending the traditional detection methodologies using the convolutionalneural network architectures. In this paper, we make an attempt to incorporateanother traditional detection schema, Regionlet into an end-to-end trained deeplearning framework, and perform ablation studies on its behavior on multipleobject detection datasets. More specifically, we propose a "region selectionnetwork" and a "gating network". The region selection network serves as aguidance on where to select regions to learn the features from. Additionally,the gating network serves as a local feature selection module to select andtransform feature maps to be suitable for detection task. It acts as softRegionlet selection and pooling. The proposed network is trained end-to-endwithout additional efforts. Extensive experiments and analysis on the PASCALVOC dataset and Microsoft COCO dataset show that the proposed frameworkachieves comparable state-of-the-art results.
展开▼