Audiovisual speaker diarization of TV series

机译：电视剧视听扬声器日复速度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.

机译：当应用于叙事薄膜时，扬声器日期可能很难实现，其中扬声器通常在不利的声学条件下谈论：背景音乐，声音效果，语调的广泛变化可能隐藏讲话者的变异性，并使基于音频的扬声器日复速度易于易于误差。另一方面，这种虚构的电影在图像级别表现出强烈的规律，特别是在对话场景中。在本文中，我们建议通过组合音频和视频模式来在电视剧的对话场景中进行扬声器日益改估：首先通过使用每个模态进行扬声器日益衰退;然后，在剩余的情况下，实例集的两个得到的分区最佳地匹配，在其两种模式之间的分歧的案例相对应的剩余实例之前，最终处理。通过对虚构薄膜施加这种多模态方法而获得的结果结果优于通过依赖于单个模态获得的结果。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2015年||共5页
会议地点
作者
X. Bost; G. Linares; S. Gueye;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Speaker diarization; multi-modal fusion; video structuration;

机译：扬声器日期;多模态融合;视频结构;

相似文献

外文文献
中文文献
专利

1. A Multimodal Approach to Speaker Diarization on TV Talk-Shows [J] . Vallet F., Essid S., Carrive J. Multimedia, IEEE Transactions on . 2013,第3期

机译：电视脱口秀中说话人差异化的一种多模式方法
2. Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization [J] . Itshak Lapidot, Alon Shoa, Tal Furmanov, Computer speech and language . 2017,第Sepa期

机译：基于通用维特比的时间序列分割和聚类模型，用于说话人区分
3. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
4. Audiovisual speaker diarization of TV series [C] . Bost Xavier, Linares Georges, Gueye Serigne IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：电视连续剧的视听扬声器二分法
5. Automatic Speaker Recognition and Diarization in Co-Channel Speech [D] . Shokouhi, Navid. 2017

机译：同频道语音中的说话人自动识别和区分
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. Constrained speaker diarization of TV series based on visual patterns [O] . Xavier Bost, Georges Linares 2014

机译：基于视觉模式的电视剧约束扬声器日复速度
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Audiovisual speaker diarization of TV series

摘要

著录项

相似文献

相关主题

期刊订阅