...
首页> 外文期刊>Multimedia Tools and Applications >Weakly supervised detection with decoupled attention-based deep representation
【24h】

Weakly supervised detection with decoupled attention-based deep representation

机译:弱监督检测与基于关注点的深度表示分离

获取原文
获取原文并翻译 | 示例
           

摘要

Training object detectors with only image-level annotations is an important problem with a variety of applications. However, due to the deformable nature of objects, a target object delineated by a bounding box always includes irrelevant context and occlusions, which causes large intra-class object variations and ambiguity in object-background distinction. For this reason, identifying the object of interest from a substantial amount of cluttered backgrounds is very challenging. In this paper, we propose a decoupled attention-based deep model to optimize region-based object representation. Different from existing approaches posing object representation in a single-tower model, our proposed network decouples object representation into two separate modules, i.e., image representation and attention localization. The image representation module captures content-based semantic representation, while the attention localization module regresses an attention map which simultaneously highlights the locations of the discriminative object parts and down weights the irrelevant backgrounds presented in the image. The combined representation alleviates the impact from the noisy context and occlusions inside an object bounding box. As a result, object-background ambiguity can be largely reduced and background regions can be suppressed effectively. In addition, the proposed object representation model can be seamlessly integrated into a state-of-the-art weakly supervised detection framework, and the entire model can be trained end-to-end. We extensively evaluate the detection performance on the PASCAL VOC 2007, VOC 2010 and VOC2012 datasets. Experimental results demonstrate that our approach effectively improves weakly supervised object detection.
机译:仅使用图像级注释来训练目标检测器是各种应用中的重要问题。但是,由于对象的可变形性,由边界框描绘的目标对象始终包含不相关的上下文和遮挡,这会导致较大的类内对象变化以及对象与背景之间的歧义。由于这个原因,从大量混乱的背景中识别感兴趣的对象非常具有挑战性。在本文中,我们提出了一种基于注意力的解耦深度模型,以优化基于区域的对象表示。与现有的在单塔模型中构成对象表示的方法不同,我们提出的网络将对象表示分解为两个独立的模块,即图像表示和注意力定位。图像表示模块捕获基于内容的语义表示,而注意力定位模块使注意力图回归,该注意力图同时突出显示可区分对象部分的位置并降低图像中显示的无关背景的权重。组合表示减轻了对象边框内嘈杂上下文和遮挡的影响。结果,可以大大减小物体-背景的模糊性,并且可以有效地抑制背景区域。此外,所提出的对象表示模型可以无缝集成到最新的弱监督检测框架中,并且可以对整个模型进行端到端训练。我们广泛评估了PASCAL VOC 2007,VOC 2010和VOC2012数据集的检测性能。实验结果表明,我们的方法有效地改善了弱监督目标检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号