【24h】

Audiovisual speaker diarization of TV series

机译:电视剧视听扬声器日复速度

获取原文

摘要

Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.
机译:当应用于叙事薄膜时,扬声器日期可能很难实现,其中扬声器通常在不利的声学条件下谈论:背景音乐,声音效果,语调的广泛变化可能隐藏讲话者的变异性,并使基于音频的扬声器日复速度易于易于误差。另一方面,这种虚构的电影在图像级别表现出强烈的规律,特别是在对话场景中。在本文中,我们建议通过组合音频和视频模式来在电视剧的对话场景中进行扬声器日益改估:首先通过使用每个模态进行扬声器日益衰退;然后,在剩余的情况下,实例集的两个得到的分区最佳地匹配,在其两种模式之间的分歧的案例相对应的剩余实例之前,最终处理。通过对虚构薄膜施加这种多模态方法而获得的结果结果优于通过依赖于单个模态获得的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号