Video Object Segmentation with Language Referring Expressions

机译：具有语言引用表达的视频对象分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target, object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical and natural way of pointing out a target object, using language specifications can help to avoid drift as well as make the system more robust to complex dynamics and appearance variations. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our approach we augment the popular video object, segmentation benchmarks, DAVIS_(16) and DAVIS_(17) with language descriptions of target objects. We show that our language-supervised approach performs on par with the methods which have access to a pixel-level mask of the target object on DAVIS_(16) and is competitive to methods using scribbles on the challenging DAVIS_(17) dataset.

机译：大多数最新的半监督视频对象分割方法都依赖于为视频的第一帧提供的目标对象的像素精确蒙版。但是，获得详细的分割蒙版既昂贵又费时。在这项工作中，我们探索了一种识别目标对象的替代方法，即采用语言引用表达式。除了使用一种更实用，更自然的方法来指出目标对象外，使用语言规范还可以帮助避免漂移，并使系统对复杂的动力学和外观变化更加健壮。利用针对图像设计的语言基础模型的最新进展，我们提出了一种将其扩展到视频数据的方法，以确保时间上连贯的预测。为了评估我们的方法，我们使用目标对象的语言描述来扩充流行的视频对象，分段基准，DAVIS_（16）和DAVIS_（17）。我们展示了我们的语言监督方法与可以访问DAVIS_（16）上目标对象的像素级蒙版的方法具有相同的性能，并且与在具有挑战性的DAVIS_（17）数据集上使用涂鸦的方法相比具有竞争优势。

著录项

来源
《Asian Conference on Computer Vision》|2018年|123-141|共19页
会议地点
作者
Anna Khoreva; Anna Rohrbach; Bernt Schiele;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Weakly supervised video object segmentation initialized with referring expression [J] . Bu Xiaoqing, Sun Yukuan, Wang Jianming, Neurocomputing . 2021,第Sepa17期

机译：通过引用表达式初始化弱监督的视频对象分段
2. Object proposals for salient object segmentation in videos [J] . Rahma Kalboussi, Aymen Azaza, Joost van de Weijer, Multimedia Tools and Applications . 2020,第13a14期

机译：视频中突出对象分段的对象提案
3. Guided Co-Segmentation Network for Fast Video Object Segmentation [J] . Liu Weide, Lin Guosheng, Zhang Tianyi, IEEE Transactions on Circuits and Systems for Video Technology . 2021,第4期

机译：用于快速视频对象分割的导向共分割网络
4. Video Object Segmentation with Language Referring Expressions [C] . Anna Khoreva, Anna Rohrbach, Bernt Schiele Asian Conference on Computer Vision . 2019

机译：具有语言引用表达式的视频对象分段
5. Object Localization from RGB-D Images and Spatial Referring Expressions [D] . Mauceri, Cecilia. 2021

机译：来自RGB-D图像和空间引用表达式的对象本地化
6. Referring strategies in American Sign Language and English (with co-speech gesture): The role of modality in referring to non-nameable objects [O] . ZED SEVCIKOVA SEHYR, BRENDA NICODEMUS, JENNIFER PETRICH, -1

机译：美式手语和英语（带有同声手势）的引用策略：情态在引用不可命名对象中的作用
7. Video Object Segmentation with Language Referring Expressions [O] . Anna Khoreva, Anna Rohrbach, Bernt Schiele 2019

机译：具有语言引用表达式的视频对象分段

Video Object Segmentation with Language Referring Expressions

摘要

著录项

相似文献

相关主题

期刊订阅