首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks
【24h】

Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

机译:在多说话者检测任务中使用类别纯度作为说话者聚类的标准

获取原文

摘要

Speaker clustering is an important step in multi-speaker detection tasks and its performance directly affects the speaker detection performance. It is observed that the shorter the average length of single-speaker speech segments after segmentation is, the worse performance of the following speaker recognition will be achieved, therefore a reasonable solution to better multi-speaker detection performance is to enlarge the average length of after-segmentation single-speaker speech segments, which is equivalently to cluster as many true same-speaker segments into one as possible. In other words, the average class purity of each speaker segment should be as bigger as possible. Accordingly, a speaker-clustering algorithm based on the class purity criterion is proposed, where a Reference Speaker Model (RSM) scheme is adopted to calculate the distance between speech segments, and the maximal class purity, or equivalently the minimal within-class dispersion, is taken as the criterion. Experiments on the NIST SRE 2006 database showed that, compared with the conventional Hierarchical Agglomerative Clustering (HAC) algorithm, for speech segments with average lengths of 2 seconds, 5 seconds and 8 seconds, the proposed algorithm increased the valid class speech length by 2.7%, 3.8% and 4.6%, respectively, and finally the target speaker detection recall rate was increased by 7.6%, 6.2% and 5.1%, respectively.
机译:扬声器聚类是多扬声器检测任务的重要步骤,其性能直接影响扬声器检测性能。观察到,分割后的单扬声器语音段的平均长度越短,将实现以下扬声器识别的更糟糕的性能,因此合理的解决方案更好的多扬声器检测性能是放大后的平均长度 - 单扬声器语音段,其等效地簇成为尽可能多的真实扬声器段。换句话说,每个扬声器段的平均纯度应尽可能大。因此,提出了一种基于类纯度标准的扬声器聚类算法,其中采用参考扬声器模型(RSM)方案来计算语音段之间的距离和最大类纯度,或等效地在级别的色散内,被视为标准。 NIST 2006数据库的实验表明,与传统的分层凝聚聚类(HAC)算法相比,对于平均长度为2秒,5秒和8秒的语音段,所提出的算法将有效类语音长度增加2.7%分别为3.8%和4.6%,最后,目标扬声器检测召回率分别增加了7.6%,6.2%和5.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号