We propose an approach to discover class-specific pixels for theweakly-supervised semantic segmentation task. We show that properly combiningsaliency and attention maps allows us to obtain reliable cues capable ofsignificantly boosting the performance. First, we propose a simple yet powerfulhierarchical approach to discover the class-agnostic salient regions, obtainedusing a salient object detector, which otherwise would be ignored. Second, weuse fully convolutional attention maps to reliably localize the class-specificregions in a given image. We combine these two cues to discover class-specificpixels which are then used as an approximate ground truth for training a CNN.While solving the weakly supervised semantic segmentation task, we ensure thatthe image-level classification task is also solved in order to enforce the CNNto assign at least one pixel to each object present in the image.Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared tothe published state-of-the-art results. The code is made publicly available.
展开▼