...
首页> 外文期刊>IEEE transactions on audio, speech and language processing >Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach
【24h】

Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

机译:说话人差异化的无监督方法:集成和迭代方法

获取原文
获取原文并翻译 | 示例

摘要

In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.
机译:在说话人歧化中,标准方法通常在重新分割步骤中细化段边界之前,对某个初始分割执行说话者聚类以获得最终的区分假设。在本文中,我们将改进的聚类方法与现有的重新分段算法集成在一起,并以迭代的方式共同优化说话人群集分配和分割边界。对于聚类,我们将先前的研究扩展到使用因子分析的说话人建模中。为了继续利用因素分析作为提取说话人特定特征(即i向量)的前端的有效性,我们通过对贝叶斯主体应用贝叶斯高斯混合模型(GMM),开发了一种概率方法来进行说话人聚类组件分析(PCA)处理的i向量。然后,我们利用不同时间分辨率的信息来得出一个迭代优化方案,该方案在聚类和重新细分步骤之间交替显示了以无人监督的方式改善说话者聚类分配和分割边界的能力。我们提出的方法所获得的结果可与多扬声器CallHome电话语料库上的最新基准进行比较。我们进一步将我们的系统与贝叶斯非参数化方法进行比较,并尝试调和它们在方法和性能上的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号