首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP >Detecting person presence in TV shows with linguistic and structural features
【24h】

Detecting person presence in TV shows with linguistic and structural features

机译:检测具有语言和结构特征的电视节目中的人身

获取原文
获取原文并翻译 | 示例

摘要

Person detection and recognition in videos is a hard problem due to the intrinsic ambiguities of the sound and image channels and their interaction. Whatever method is used to extract person hypotheses from the audio or the image channels, person recognition in videos relies on a multimodal decision process that merges the different hypotheses produced in order to decide, for each frame, who is present in the video at the audio level, at the image level or at the content level (person mention in speech or inserted text boxes). In this framework the focus of this paper is to produce a list of person presence hypotheses from the audio channel of a video document only, to be used in addition to person presence detected at the image level by a multimodal fusion process. In this study we focus on the audio channel only, using two kinds of features: linguistic features corresponding to the way a person is mentioned by a speaker; structural features corresponding to the context of occurrence of a name in a show. We show that both sets of features are complementary and that good results can be achieved on a TV show corpus annotated with person presence labels.
机译:由于声音和图像通道及其交互的固有歧义性,视频中的人物检测和识别是一个难题。无论使用哪种方法从音频或图像通道中提取人的假设,视频中的人识别都依赖于多模式决策过程,该过程将产生的不同假设进行合并,以便针对每个帧确定在视频中出现在音频中的人级别,图像级别或内容级别(语音中提及的人或插入的文本框)。在这种框架下,本文的重点是仅从视频文档的音频通道中生成人身假设的列表,除了通过多模式融合过程在图像级别检测到的人身存在之外,还将使用这些假设。在本研究中,我们仅使用两种特征来关注音频通道:与讲话者提及人的方式相对应的语言特征;与节目中出现名字的上下文相对应的结构特征。我们证明这两组功能是互补的,并且可以在带有人身标签的电视节目语料库上实现良好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号