首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Speaker Clustering Using Decision Tree-Based Phone Cluster Models With Multi-Space Probability Distributions
【24h】

Speaker Clustering Using Decision Tree-Based Phone Cluster Models With Multi-Space Probability Distributions

机译:使用基于决策树的具有多空间概率分布的电话聚类模型进行说话人聚类

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.
机译:本文提出了一种使用基于决策树的电话群集模型(DT-PCM)进行说话人群集的方法。在这种方法中,首先将电话聚类应用于构建通用电话聚类模型,以适应来自不同扬声器的声学特性。由于音调特征与说话者高度相关且有利于说话者识别,因此构建了基于多空间概率分布(MSD)的决策树,该决策树可用于同时对有声和无声语音的音高和倒谱特征进行建模。在基于DT-PCM的说话人聚类中,每个输入语音段的上下文,语音和韵律特征用于从MSD决策树中选择与说话人相关的MSD,以构建初始的电话聚类模型。然后,采用最大似然线性回归(MLLR)方法,根据输入语音段,将初始模型调整为适合说话者的电话群集模型。最后,将聚类聚类算法应用于所有适合说话者的电话聚类模型,每个模型代表一个输入语音段,以进行说话者聚类。此外,为模型参数组合提出了一种有效的电话模型合并估计方法。实验结果表明,基于MSD的DT-PCM优于传统的基于GMM和HMM的针对RT09任务的扬声器聚类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号