In object detection, an intersection over union (IoU) threshold is requiredto define positives and negatives. An object detector, trained with low IoUthreshold, e.g. 0.5, usually produces noisy detections. However, detectionperformance tends to degrade with increasing the IoU thresholds. Two mainfactors are responsible for this: 1) overfitting during training, due toexponentially vanishing positive samples, and 2) inference-time mismatchbetween the IoUs for which the detector is optimal and those of the inputhypotheses. A multi-stage object detection architecture, the Cascade R-CNN, isproposed to address these problems. It consists of a sequence of detectorstrained with increasing IoU thresholds, to be sequentially more selectiveagainst close false positives. The detectors are trained stage by stage,leveraging the observation that the output of a detector is a good distributionfor training the next higher quality detector. The resampling of progressivelyimproved hypotheses guarantees that all detectors have a positive set ofexamples of equivalent size, reducing the overfitting problem. The same cascadeprocedure is applied at inference, enabling a closer match between thehypotheses and the detector quality of each stage. A simple implementation ofthe Cascade R-CNN is shown to surpass all single-model object detectors on thechallenging COCO dataset. Experiments also show that the Cascade R-CNN iswidely applicable across detector architectures, achieving consistent gainsindependently of the baseline detector strength. The code will be madeavailable at https://github.com/zhaoweicai/cascade-rcnn.
展开▼