首页> 外文会议>IEEE Workshop on Spoken Language Technology >Speaker diarization and linking of large corpora
【24h】

Speaker diarization and linking of large corpora

机译:扬声器日复速度和大型公司的连接

获取原文

摘要

Performing speaker diarization of a collection of recordings, where speakers are uniquely identified across the database, is a challenging task. In this context, inter-session variability compensation and reasonable computation times are essential to be addressed. In this paper we propose a two-stage system composed of speaker diarization and speaker linking modules that are able to perform data set wide speaker diarization and that handle both large volumes of data and inter-session variability compensation. The speaker linking system agglomeratively clusters speaker factor posterior distributions, obtained within the Joint Factor Analysis framework, that model the speaker clusters output by a standard speaker diarization system. Therefore, the technique inherently compensates the channel variability effects from recording to recording within the database. A threshold is used to obtain meaningful speaker clusters by cutting the dendrogram obtained by the agglomerative clustering. We show how the Hotteling t-square statistic is an interesting distance measure for this task and input data, obtaining the best results and stability. The system is evaluated using three subsets of the AMI corpus involving different speaker and channel variabilities. We use the within-recording and across-recording diarization error rates (DER), cluster purity and cluster coverage to measure the performance of the proposed system. Across-recording DER as low as within-recording DER are obtained for some system setups.
机译:表演扬声器日复一步记录,扬声器在数据库中唯一识别,是一个具有挑战性的任务。在这种情况下,会话间可变性补偿和合理的计算时间是必不可少的。在本文中,我们提出了一种由扬声器日益化和扬声器链接模块组成的两级系统,能够执行数据集宽扬声器日复速度,并处理大量数据和会话间变化补偿。在联合因子分析框架内获得的扬声器连接系统附带簇簇扬声器因子后部分布,该框架通过标准扬声器简化系统造型输出扬声器集群。因此,该技术固有地补偿了从记录到数据库内记录的信道可变性效果。通过切割通过聚集聚类获得的树木图来获得阈值来获得有意义的扬声器簇。我们展示了Hotteling T-Square统计学如何是此任务和输入数据的有趣距离测量,获得最佳效果和稳定性。使用涉及不同扬声器和频道可变性的AMI语料库的三个子集来评估该系统。我们使用录制内和跨录制的深度误差率(DER),群集纯度和集群覆盖范围来测量所提出的系统的性能。对于某些系统设置,获得了横跨记录DAN,以便在记录中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号