【24h】

Semantic Amodal Segmentation

机译:语义无模式分割

获取原文

摘要

Common visual recognition tasks such as classification, object detection, and semantic segmentation are rapidly reaching maturity, and given the recent rate of progress, it is not unreasonable to conjecture that techniques for many of these problems will approach human levels of performance in the next few years. In this paper we look to the future: what is the next frontier in visual recognition? We offer one possible answer to this question. We propose a detailed image annotation that captures information beyond the visible pixels and requires complex reasoning about full scene structure. Specifically, we create an amodal segmentation of each image: the full extent of each region is marked, not just the visible pixels. Annotators outline and name all salient regions in the image and specify a partial depth order. The result is a rich scene structure, including visible and occluded portions of each region, figure-ground edge information, semantic labels, and object overlap. We create two datasets for semantic amodal segmentation. First, we label 500 images in the BSDS dataset with multiple annotators per image, allowing us to study the statistics of human annotations. We show that the proposed full scene annotation is surprisingly consistent between annotators, including for regions and edges. Second, we annotate 5000 images from COCO. This larger dataset allows us to explore a number of algorithmic ideas for amodal segmentation and depth ordering. We introduce novel metrics for these tasks, and along with our strong baselines, define concrete new challenges for the community.
机译:常见的视觉识别任务(例如分类,对象检测和语义分段)正在迅速达到成熟,并且鉴于最近的发展速度,可以推测,针对其中许多问题的技术将在接下来的几年中达到人类的表现水平,这并非没有道理。年。在本文中,我们展望未来:视觉识别的下一个前沿是什么?我们为这个问题提供一个可能的答案。我们提出了一个详细的图像注释,该注释可捕获可见像素以外的信息,并且需要对整个场景结构进行复杂的推理。具体来说,我们为每个图像创建一个无模态分割:标记每个区域的整个范围,而不仅仅是可见像素。注释器会概述并命名图像中的所有显着区域,并指定部分深度顺序。结果是丰富的场景结构,包括每个区域的可见部分和遮挡部分,图形底边缘信息,语义标签和对象重叠。我们为语义非模式分割创建了两个数据集。首先,我们在BSDS数据集中为500张图像标记每个图像带有多个注释器,从而使我们能够研究人类注释的统计信息。我们表明,拟议的全场景注释在注释器之间令人惊讶地保持一致,包括针对区域和边缘的注释。其次,我们注释来自COCO的5000张图像。这个更大的数据集使我们能够探索用于无模态分割和深度排序的许多算法思想。我们针对这些任务引入了新颖的指标,并结合了强大的基准,为社区定义了具体的新挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号