首页> 外文会议>Asian Conference on Computer Vision >Video Object Segmentation with Language Referring Expressions
【24h】

Video Object Segmentation with Language Referring Expressions

机译:具有语言引用表达的视频对象分割

获取原文

摘要

Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target, object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical and natural way of pointing out a target object, using language specifications can help to avoid drift as well as make the system more robust to complex dynamics and appearance variations. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our approach we augment the popular video object, segmentation benchmarks, DAVIS_(16) and DAVIS_(17) with language descriptions of target objects. We show that our language-supervised approach performs on par with the methods which have access to a pixel-level mask of the target object on DAVIS_(16) and is competitive to methods using scribbles on the challenging DAVIS_(17) dataset.
机译:大多数最新的半监督视频对象分割方法都依赖于为视频的第一帧提供的目标对象的像素精确蒙版。但是,获得详细的分割蒙版既昂贵又费时。在这项工作中,我们探索了一种识别目标对象的替代方法,即采用语言引用表达式。除了使用一种更实用,更自然的方法来指出目标对象外,使用语言规范还可以帮助避免漂移,并使系统对复杂的动力学和外观变化更加健壮。利用针对图像设计的语言基础模型的最新进展,我们提出了一种将其扩展到视频数据的方法,以确保时间上连贯的预测。为了评估我们的方法,我们使用目标对象的语言描述来扩充流行的视频对象,分段基准,DAVIS_(16)和DAVIS_(17)。我们展示了我们的语言监督方法与可以访问DAVIS_(16)上目标对象的像素级蒙版的方法具有相同的性能,并且与在具有挑战性的DAVIS_(17)数据集上使用涂鸦的方法相比具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号