Speaker Diarization and Linking of Meeting Data

Marc Ferràs; Srikanth Madikeri; Hervé Bourlard

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speaker Diarization and Linking of Meeting Data

【24h】

Speaker Diarization and Linking of Meeting Data

机译：演讲者区分和会议数据链接

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Finding who spoke when in a collection of recordings, with speakers being uniquely identified across the database, is a challenging task. In this scenario, reasonable computing times and acoustic variation across recordings remain two major concerns to address in state-of-the-art speaker diarization systems. This paper extends prior work on diarizing large speech datasets using algorithms that scale well with increasing amounts of data while compensating for across-recording variability. We follow a two-stage approach performing speaker diarization and speaker linking, the former focusing on local within-recording speaker changes and the latter focusing on global speaker changes across the database. In this study, we explore how these two modules interact with each other, while proposing a diarization fusion approach that prevents diarization errors from propagating to the linking stage. We further explore the diarization fusion for speaker linking using different linking strategies and speaker modeling variants. Evaluation is performed on single distant microphone data from the augmented multiparty interaction corpus show the effectiveness of the fusion approach after speaker linking and intersession variability modeling via joint factor analysis.

机译：在数据库中唯一地确定说话者的情况下，查找记录集合中的讲话者是一项艰巨的任务。在这种情况下，合理的计算时间和整个录音的声音变化仍然是当前最先进的扬声器分离系统要解决的两个主要问题。本文扩展了使用算法对大型语音数据集进行数字化的现有工作，该算法可随着数据量的增加而很好地扩展，同时补偿跨记录的可变性。我们遵循两阶段方法来执行说话者区分和说话者链接，前者着重于本地内部记录的说话者变化，而后者着重于整个数据库中的全局说话者变化。在这项研究中，我们探索了这两个模块之间如何相互作用，同时提出了一种防止融合误差传播到链接阶段的差分融合方法。我们将进一步探讨使用不同链接策略和说话人建模变体进行说话人链接的差分融合。对来自增强型多方互动语料库的单个远距离麦克风数据进行评估，显示了说话者链接和会话间可变性建模（通过联合因子分析）后融合方法的有效性。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第11期|1935-1945|共11页
作者
Marc Ferràs; Srikanth Madikeri; Hervé Bourlard;
展开▼
作者单位

Idiap Research Institute, Martigny, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Gaussian mixture model (GMM); i-vector; information bottleneck (IB); joint factor analysis (JFA); speaker diarization; speaker linking; ward clustering;

机译：高斯混合模型（GMM）;i矢量;信息瓶颈（IB）;联合因子分析（JFA）;说话人二分化;说话人链接;病房聚类;

相似文献

外文文献
中文文献
专利

1. An Information Theoretic Approach to Speaker Diarization of Meeting Data [J] . Vijayasenan D., Valente F., Bourlard H. Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第7期

机译：会议数据说话人差异化的一种信息理论方法
2. Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis [J] . Cabanas-Molero P., Lucena M., Fuertes J. M., Multimedia Tools and Applications . 2018,第20期

机译：使用音量评估的SRP-PHAT和视频分析为会议提供多峰发言人二分法
3. Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations [J] . Yella S.H., Bourlard H. Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2014,第12期

机译：会议室会话中使用长期会话特征进行语音重叠的语音检测重叠
4. Spherediar: An Effective Speaker Diarization System for Meeting Data [C] . Tuomas Kaseva, Aku Rouhe, Mikko Kurimo IEEE Automatic Speech Recognition and Understanding Workshop . 2019

机译：Spherediar：有效的会议数据说话人区分系统
5. Use of speaker location features in meeting diarization. [D] . Otterson, Scott. 2008

机译：会议发言者使用语音定位功能。
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. MUTUAL INFORMATION BASED CHANNEL SELECTION FOR SPEAKER DIARIZATION OF MEETINGS DATA [O] . Deepu Vijayasenan, Fabio Valente, Hervé Bourlard 2013

机译：基于互信息的会议数据说话人通道选择
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Speaker Diarization and Linking of Meeting Data

摘要

著录项

相似文献

相关主题

期刊订阅