首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Detecting person presence in TV shows with linguistic and structural features
【24h】

Detecting person presence in TV shows with linguistic and structural features

机译:用语言和结构特征检测电视节目中的人数

获取原文

摘要

Person detection and recognition in videos is a hard problem due to the intrinsic ambiguities of the sound and image channels and their interaction. Whatever method is used to extract person hypotheses from the audio or the image channels, person recognition in videos relies on a multimodal decision process that merges the different hypotheses produced in order to decide, for each frame, who is present in the video at the audio level, at the image level or at the content level (person mention in speech or inserted text boxes). In this framework the focus of this paper is to produce a list of person presence hypotheses from the audio channel of a video document only, to be used in addition to person presence detected at the image level by a multimodal fusion process. In this study we focus on the audio channel only, using two kinds of features: linguistic features corresponding to the way a person is mentioned by a speaker; structural features corresponding to the context of occurrence of a name in a show. We show that both sets of features are complementary and that good results can be achieved on a TV show corpus annotated with person presence labels.
机译:由于声音和图像频道的内在模糊及其互动,视频中的人员检测和识别是一个难题。无论哪种方法用于从音频或图像通道中提取人假设,视频中的人员识别依赖于合并所产生的不同假设的多模式决策过程,以便为每个帧判断在音频处的视频中存在的每个帧。级别,在图像级别或内容级别(在语音或插入的文本框中提到)。在该框架中,本文的焦点是在仅通过多模式融合过程中在图像级别检测到的人存在之外,仅从视频文档的音频信道中生成一个人存在假设的列表。在这项研究中,我们仅关注音频通道,使用两种特征:语言特征对应于扬声器提到的人的方式;对应于显示中名称的上下文的结构特征。我们表明这两组功能都是互补的,并且可以在与人存在标签注释的电视节目语料库上实现良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号