...
首页> 外文期刊>International journal of speech technology >Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering
【24h】

Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering

机译:使用MKMFCC参数化和WLI-模糊聚类的说话人区分系统

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Speaker diarization is the process of determining "who speak when?" with appropriate speaker labels with respect to the time regions where they spoke. Accordingly, in the previous work, a model based speaker diarization using the tangential weighted Mel frequency cepstral coefficients as the feature parameter for the voice activity detection and Lion optimization algorithm for the clustering of the audio streams into speaker group was performed. In this paper, speaker diarization system is proposed using multiple kernel weighted Mel frequency cepstral coefficient (MKMFCC) parameterization and Wu-and-Li Index (WLI)-fuzzy clustering. First, a MKMFCC which utilizes the multiple kernels like the tangential and exponential for weighting the MFCC's is proposed for the feature parameterization. Second, a clustering algorithm called the WLI-Fuzzy clustering is proposed for grouping the segments of the same speaker groups. The experimentation of the proposed speaker diarization system is carried out over the publically available ELSDSR corpus data set having the audio signal with seven different speakers. The performance evaluation of the proposed speaker diarization system is analysed using the measures such as diarization error rate, F-measure and false alarm rate. The results show that the proposed speaker diarization system proved better for tracking the active speakers from multiple speakers with improved tracking accuracy.
机译:说话人二语化是确定“谁在何时说话”的过程。并针对他们说话的时间区域使用适当的扬声器标签。因此,在先前的工作中,使用切向加权的梅尔频率倒谱系数作为用于语音活动检测的特征参数和用于将音频流聚类为说话者组的Lion优化算法,执行了基于模型的说话者二分法。本文提出了一种基于多核加权梅尔频率倒谱系数(MKMFCC)参数化和吴李指数(WLI)-模糊聚类的说话人区分系统。首先,提出了一种MKMFCC用于特征参数化,该MKMFCC利用诸如切线和指数的多个内核对MFCC进行加权。其次,提出了一种称为WLI-Fuzzy聚类的聚类算法,用于对同一说话者组的片段进行分组。在具有七个不同扬声器的音频信号的可公开获得的ELSDSR语料数据集上进行了所提出的扬声器区分系统的实验。通过使用诸如误差误差率,F-度量和误报率之类的措施,分析了所提出的说话人分离系统的性能评估。结果表明,所提出的说话人区分系统被证明更好地跟踪了来自多个说话者的活动说话者,具有更高的跟踪精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号