首页> 外文会议>International Conference on Text, Speech and Dialogue >Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation
【24h】

Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation

机译:无监督扬声器和面部模型创建的音频 - 视频扬声器绪化

获取原文

摘要

Our goal is to create speaker models in audio domain and face models in video domain from a set of videos in an unsupervised manner. Such models can be used later for speaker identification in audio domain (answering the question "Who was speaking and when") and/or for face recognition ("Who was seen and when") for given videos that contain speaking persons. The proposed system is based on an audio-video diarization system that tries to resolve the disadvantages of the individual modalities. Experiments on broadcasts of Czech parliament meetings show that the proposed combination of individual audio and video diarization systems yields an improvement of the diarization error rate (DER).
机译:我们的目标是以无监督的方式从一组视频中创建音频域和面部模型中的音频域和面部模型。此类模型可以在稍后用于音频域中的扬声器识别(回答正在讲话的问题以及当“)和/或面部识别时(”谁和何时“),给定包含说话人的视频。所提出的系统基于音频 - 视频深度化系统,该系统试图解决各种方式的缺点。捷克议会会议的广播实验表明,所提出的个体音频和视频日益缓解系统的组合产生了深度缓释误差率(DER)的改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号