首页> 外文会议>2010 IEEE Conference on Computer Vision and Pattern Recognition >Talking pictures: Temporal grouping and dialog-supervised person recognition
【24h】

Talking pictures: Temporal grouping and dialog-supervised person recognition

机译:说话图片:时间分组和对话指导的人识别

获取原文

摘要

We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I''m Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuity-editing cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.
机译:我们解决了电影和电视视频中的字符识别问题:为屏幕上的面孔分配名称。视频中有关人的识别的大多数先前工作都是假设一些受监督的数据,例如电影剧本或手动贴上标签的面孔。在本文中,“监督”的唯一来源是对话提示:第一,第二和第三人称参考(例如“我是杰克”,“嘿,杰克!”和“杰克左”)。尽管这种监管是稀疏和间接的,但我们利用多种方式及其相互作用(外观,对话,嘴巴运动,同步性,连续性提示)来通过局部时态分组,然后进行全局弱监督识别来有效地解决身份。我们提出了一种新颖的时间分组模型,该模型在考虑外观,几何和电影编辑提示以及约束的同时,将脸部轨迹划分为多个镜头。在此模型中,状态表示k个最近的面部轨迹的分区,而过渡表示连续分区的兼容性。我们为模型提供了动态编程推理和判别式学习。随后通过从部分标签约束中学习分类器,为各个面部轨迹分配名称。弱监督分类器结合了对话提示中的多实例约束以及我们的时间分组中的软分组约束。我们评估了数小时的电视和电影上的时间分组和最终字符命名。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号