首页> 外文会议>2012 IEEE Workshop on Spoken Language Technology. >Speaker diarization and linking of large corpora
【24h】

Speaker diarization and linking of large corpora

机译:说话人二分法和大型语料库的链接

获取原文
获取原文并翻译 | 示例

摘要

Performing speaker diarization of a collection of recordings, where speakers are uniquely identified across the database, is a challenging task. In this context, inter-session variability compensation and reasonable computation times are essential to be addressed. In this paper we propose a two-stage system composed of speaker diarization and speaker linking modules that are able to perform data set wide speaker diarization and that handle both large volumes of data and inter-session variability compensation. The speaker linking system agglomeratively clusters speaker factor posterior distributions, obtained within the Joint Factor Analysis framework, that model the speaker clusters output by a standard speaker diarization system. Therefore, the technique inherently compensates the channel variability effects from recording to recording within the database. A threshold is used to obtain meaningful speaker clusters by cutting the dendrogram obtained by the agglomerative clustering. We show how the Hotteling t-square statistic is an interesting distance measure for this task and input data, obtaining the best results and stability. The system is evaluated using three subsets of the AMI corpus involving different speaker and channel variabilities. We use the within-recording and across-recording diarization error rates (DER), cluster purity and cluster coverage to measure the performance of the proposed system. Across-recording DER as low as within-recording DER are obtained for some system setups.
机译:对录音集合进行说话者二分化(在整个数据库中唯一确定说话者)是一项艰巨的任务。在这种情况下,必须解决会话间的可变性补偿和合理的计算时间。在本文中,我们提出了一个由演讲者区分和演讲者链接模块组成的两阶段系统,该模块能够执行整个演讲者范围的数据集,并且能够处理大量数据和会话间可变性补偿。说话人链接系统将在联合因素分析框架内获得的说话人因素后验分布聚集在一起,从而对标准说话人二元化系统输出的说话人丛集进行建模。因此,该技术固有地补偿了从数据库中的记录到记录之间的信道可变性影响。阈值用于通过切割聚集聚类获得的树状图来获得有意义的说话者聚类。我们展示了Hotteling t平方统计量是如何针对此任务和输入数据进行有趣的距离度量,从而获得最佳结果和稳定性的。使用涉及不同说话者和频道可变性的AMI语料库的三个子集评估系统。我们使用记录内和记录间的误差率(DER),簇纯度和簇覆盖率来衡量所提出系统的性能。对于某些系统设置,可以获得与内部记录DER一样低的交叉记录DER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号