Speaker diarization and linking of large corpora

机译：说话人二分法和大型语料库的链接

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Performing speaker diarization of a collection of recordings, where speakers are uniquely identified across the database, is a challenging task. In this context, inter-session variability compensation and reasonable computation times are essential to be addressed. In this paper we propose a two-stage system composed of speaker diarization and speaker linking modules that are able to perform data set wide speaker diarization and that handle both large volumes of data and inter-session variability compensation. The speaker linking system agglomeratively clusters speaker factor posterior distributions, obtained within the Joint Factor Analysis framework, that model the speaker clusters output by a standard speaker diarization system. Therefore, the technique inherently compensates the channel variability effects from recording to recording within the database. A threshold is used to obtain meaningful speaker clusters by cutting the dendrogram obtained by the agglomerative clustering. We show how the Hotteling t-square statistic is an interesting distance measure for this task and input data, obtaining the best results and stability. The system is evaluated using three subsets of the AMI corpus involving different speaker and channel variabilities. We use the within-recording and across-recording diarization error rates (DER), cluster purity and cluster coverage to measure the performance of the proposed system. Across-recording DER as low as within-recording DER are obtained for some system setups.

机译：对录音集合进行说话者二分化（在整个数据库中唯一确定说话者）是一项艰巨的任务。在这种情况下，必须解决会话间的可变性补偿和合理的计算时间。在本文中，我们提出了一个由演讲者区分和演讲者链接模块组成的两阶段系统，该模块能够执行整个演讲者范围的数据集，并且能够处理大量数据和会话间可变性补偿。说话人链接系统将在联合因素分析框架内获得的说话人因素后验分布聚集在一起，从而对标准说话人二元化系统输出的说话人丛集进行建模。因此，该技术固有地补偿了从数据库中的记录到记录之间的信道可变性影响。阈值用于通过切割聚集聚类获得的树状图来获得有意义的说话者聚类。我们展示了Hotteling t平方统计量是如何针对此任务和输入数据进行有趣的距离度量，从而获得最佳结果和稳定性的。使用涉及不同说话者和频道可变性的AMI语料库的三个子集评估系统。我们使用记录内和记录间的误差率（DER），簇纯度和簇覆盖率来衡量所提出系统的性能。对于某些系统设置，可以获得与内部记录DER一样低的交叉记录DER。

著录项

来源
《2012 IEEE Workshop on Spoken Language Technology.》|2012年|p.280-285|共6页
会议地点 Miami FL(US);Miami FL(US)
作者
Ferras Marc; Boudard Herve;
展开▼
作者单位

Idiap Research Institute, Martigny, Switzerland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类语音信号处理;语音信号处理;
关键词
agglomerative clustering; joint factor analysis; speaker diarization; speaker linking; ward method;

机译：聚集聚类;联合因子分析;说话人二分化;说话人链接;病区法;;

相似文献

外文文献
中文文献
专利

1. Speaker Diarization and Linking of Meeting Data [J] . Marc Ferràs, Srikanth Madikeri, Hervé Bourlard Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第11期

机译：演讲者区分和会议数据链接
2. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
3. Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information [J] . Ishiguro K., Yamada T., Araki S., Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第2期

机译：说话者角度信息的词袋表示概率的说话人区分
4. Speaker diarization and linking of large corpora [C] . Ferras Marc, Boudard Herve IEEE Workshop on Spoken Language Technology . 2012

机译：扬声器日复速度和大型公司的连接
5. Automatic Speaker Recognition and Diarization in Co-Channel Speech [D] . Shokouhi, Navid. 2017

机译：同频道语音中的说话人自动识别和区分
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. Speaker diarization and linking of large corpora [O] . Marc Ferras, Herve Boudard 2012

机译：大公司的扬声器日复速度和联系
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Speaker diarization and linking of large corpora

摘要

著录项

相似文献

相关主题

期刊订阅