首页> 外文会议>17th IEEE International Conference on Image Processing >Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows
【24h】

Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows

机译:强大的视觉功能可用于电视脱口秀节目中未注册演讲者的多模式识别

获取原文

摘要

In this paper we propose a novel multimodal method for identifying unregistered speakers in a TV talk-show using a semi-supervised learning approach based on Support Vector Machines. Our study highlights the fact that specific visual features prove to be very efficient for this particular type of video content which is edited from multi-camera recordings. These visual features, motivated by prior knowledge on the approach followed by the TV director in choosing the appropriate shots, are found to bring a significant improvement in identification accuracy when used together with classic audio Mel-frequency cepstral coefficients (+8% compared to various baseline systems, in particular a standard audio only system).
机译:在本文中,我们提出了一种新的多模式方法,该方法使用基于支持向量机的半监督学习方法来识别电视脱口秀节目中未注册的说话者。我们的研究强调了这样一个事实,即特定的视觉功能对于从多机位录像中编辑的这种特定类型的视频内容非常有效。这些视觉功能是由电视导演在选择合适的镜头时所采用的方法的先验知识所激发的,与经典音频梅尔频率倒谱系数一起使用时,识别精度显着提高(与各种音频相比,提高了8%)基准系统,尤其是标准的纯音频系统)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号