首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >AUDIO-VISUAL SYNCHRONY FOR DETECTION OF MONOLOGUES IN VIDEO ARCHIVES
【24h】

AUDIO-VISUAL SYNCHRONY FOR DETECTION OF MONOLOGUES IN VIDEO ARCHIVES

机译:用于检测视频档案中独白的视听同步

获取原文

摘要

In this paper we present our approach to detect monologues in video shots. A monologue shot is defined as a shot containing a talking person in the video channel with the corresponding speech in the audio channel. Whilst motivated by the TREC 2002 Video Retrieval Track (VT02), the underlying approach of synchrony between audio and video signals are also applicable for voice and face-based biometrics, assessing of lip-synchronization quality in movie editing, and for speaker localization in video. Our approach is envisioned as a two part scheme. We first detect occurrence of speech and face in a video shot. In shots containing both speech and a face, we distinguish monologue shots as those shots where the speech and facial movements are synchronized. To measure the synchrony between speech and facial movements we use a mutual-information based measure. Experiments with the VT02 corpus indicate that using synchrony, the average precision improves by more than 50% relative compared to using face and speech information alone. Our synchrony based monologue detector submission had the best average precision performance (in VT02) amongst 18 different submissions.
机译:在本文中,我们提出了我们在视频镜头中检测独白的方法。独白镜头被定义为包含视频通道中的谈话人的镜头,其中音频通道中的相应语音。虽然由TREC 2002视频检索轨道(VT02)动机,音频和视频信号之间同步的基础方法也适用于语音和面部的生物识别,评估电影编辑中的唇部同步质量,以及视频中的扬声器本地化。我们的方法被设想为两部分方案。我们首先在视频拍摄中检测出现言语和脸部。在包含言论和脸部的镜头中,我们将独白镜头区分开,因为这些镜头可以同步语音和面部运动。为了测量语音和面部运动之间的同步,我们使用基于互信息的衡量标准。 VT02语料库的实验表明,与单独使用面部和语音信息相比,使用同步,平均精度可提高50%以上。我们的同步的独白探测器提交提交最佳的平均精度性能(在VT02中)在18个不同的提交中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号