Talking pictures: Temporal grouping and dialog-supervised person recognition

机译：说话图片：时间分组和对话指导的人识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I''m Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuity-editing cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.

机译：我们解决了电影和电视视频中的字符识别问题：为屏幕上的面孔分配名称。视频中有关人的识别的大多数先前工作都是假设一些受监督的数据，例如电影剧本或手动贴上标签的面孔。在本文中，“监督”的唯一来源是对话提示：第一，第二和第三人称参考（例如“我是杰克”，“嘿，杰克！”和“杰克左”）。尽管这种监管是稀疏和间接的，但我们利用多种方式及其相互作用（外观，对话，嘴巴运动，同步性，连续性提示）来通过局部时态分组，然后进行全局弱监督识别来有效地解决身份。我们提出了一种新颖的时间分组模型，该模型在考虑外观，几何和电影编辑提示以及约束的同时，将脸部轨迹划分为多个镜头。在此模型中，状态表示k个最近的面部轨迹的分区，而过渡表示连续分区的兼容性。我们为模型提供了动态编程推理和判别式学习。随后通过从部分标签约束中学习分类器，为各个面部轨迹分配名称。弱监督分类器结合了对话提示中的多实例约束以及我们的时间分组中的软分组约束。我们评估了数小时的电视和电影上的时间分组和最终字符命名。

著录项

来源
《2010 IEEE Conference on Computer Vision and Pattern Recognition》|2010年|P.1014-1021|共8页
会议地点
作者
Cour Timothee; Sapp Benjamin; Nagle Akash; Taskar Ben;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词

相似文献

外文文献
中文文献
专利

1. RECOGNITION OF SELF AMONG PERSONS WITH DEMENTIA: Pictures Versus Names as Environmental Supports [J] . JENNIFER GROSS, MARY E. HARMON, REBECCA A. MYERS, Environment & Behavior . 2004,第3期

机译：人与痴呆症之间的自我认可：图片与名字作为环境支持
2. What am I doing in Timbuktu: person-environment picture recognition for persons with intellectual disability [J] . H. Danielsson, J. Roennberg, J. Andersson Journal of Intellectual Disability Research . 2006,第Pt2期

机译：我在廷巴克图（Timbuktu）所做的工作：智障人士的人-环境图片识别
3. Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization [J] . Ueda Yuma, Wang Longbiao, Kai Atsuhiko, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：结合去噪自动编码器和时间结构归一化的单通道去混响用于远距离语音识别
4. Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition [C] . Timothee Cour, Benjamin Sapp, Akash Nagle, IEEE Conference on Computer Vision and Pattern Recognition . 2010

机译：谈论图片：时间分组和对话监督人员认可
5. Talking it up: The role of temporal context in the interpretation of uptalk. [D] . Tomlinson, John M., Jr. 2009

机译：直言不讳：时间语境在解释误话中的作用。
6. Does ABLA Test Performance on the ABLA Test Predict Picture Receptive Name Recognition with Persons with Severe Developmental Disabilities [O] . Aynsley K Verbeke, Garry L Martin, C. T Yu, 2007

机译：ABLA测试中的ABLA测试性能是否会预测患有严重发育障碍的人的图片接受名称识别
7. Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition [O] . Timothee Cour, Akash Nagle, Ben Taskar 2011

机译：会说话的图片：时间分组和对话监督的人格识别

Talking pictures: Temporal grouping and dialog-supervised person recognition

摘要

著录项

相似文献

相关主题

期刊订阅