首页> 外文OA文献 >Robust speaker diarization for single channel recorded meetings
【2h】

Robust speaker diarization for single channel recorded meetings

机译:针对单通道录制会议的强大扬声器分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This thesis describes research into speaker diarization for recorded meetings. It explores the algorithms and the implementation of an off-line speaker segmentation and clustering system for meetings that have been recorded using one microphone. Speaker diarization is defined as a process of partitioning a spoken record into speaker-homogeneous regions. The meeting record contains different kinds of noise and the length of the noise varies significantly. The average speech-turn is short and the number of speakers is unknown. To reduce the influence of these aural characteristics on the performance of the speaker diarization system, this thesis proposed four new algorithms. First, a new speech activity detection method, which adjusts the non-speech model complexity according to the noise length ratio. Second, a new speaker change point detection measure was derived based on the Fisher Linear Discriminate Analysis to help detect short speaker turns. Third, the Equal Weight Penalty Criterion was formulated as a new model complexity selection criterion to train both the speakers' models and the Universal Background Model (UBM). It contains two penalty terms, one penalizes the model dimensions and removes mixtures with small mixing probability, the other penalizes the Kullback Leibler divergence between the prior and posterior distribution of the mixing parameters, removing those components that share the same location. This criterion can be adjusted by the prior distribution parameter delta, which controls how many components are used in the model. Fourth, a weight and mean adaptation method was developed to adapt potential speaker models from the UBM. In addition, a potential speaker merging termination scheme, based on the Normalized Cuts, was introduced into the system. Combining all the new techniques derived in this thesis together, the error rate of the baseline system was reduced from 18.61% to 9.24% on the development set, 18.89% to 10.50% on the evaluation set from AMI corpus, and 21.35% to 15.48% on the evaluation set from ISL corpus. When using the Normalized Cuts based potential speaker merging termination scheme, the error rate of the baseline system was reduced 18.61% to 10.33% on the development set, 18.89% to 9.99% on the evaluation set from AMI corpus, and 21.35% to 13.70% percentage points on the evaluation set from ISL corpus.
机译:本文介绍了对录制会议的演讲者区分的研究。它探讨了使用一个麦克风录制的会议的离线发言人分割和群集系统的算法和实现。说话者二值化定义为将口语记录划分为多个说话者同质区域的过程。会议记录包含不同种类的噪音,并且噪音的长度明显不同。平均语音转弯很短,说话者的数量未知。为了减少这些听觉特性对说话人扩声系统性能的影响,本文提出了四种新算法。首先,一种新的语音活动检测方法,该方法根据噪声长度比调整非语音模型的复杂度。其次,基于Fisher线性判别分析得出了一种新的说话人变化点检测方法,以帮助检测说话人的短弯。第三,制定了“等重罚金标准”作为新的模型复杂性选择标准,以训练说话者的模型和通用背景模型(UBM)。它包含两个惩罚项,一个惩罚模型尺寸并删除混合概率较小的混合物,另一个惩罚混合参数的前后分布之间的Kullback Leibler散度,删除那些共享同一位置的组件。可以通过先验分布参数delta来调整此标准,该参数控制模型中使用了多少个组件。第四,开发了权重和均值自适应方法,以适应UBM的潜在说话者模型。另外,基于归一化剪切的潜在讲话者合并终止方案被引入到系统中。结合本文中得出的所有新技术,基线系统的错误率在开发集上从18.61%降低到9.24%,在AMI语料库上的评估集从18.89%降低到10.50%,从21.35%降低到15.48%来自ISL语料库的评估集。当使用基于归一化剪切的潜在说话人合并终止方案时,基线集的错误率在开发集上降低了18.61%至10.33%,在AMI语料库的评估集上降低了18.89%至9.99%,在21.35%至13.70%上降低了ISL语料库评估集上的百分比。

著录项

  • 作者

    Fu Rong;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号