首页> 外文期刊>Sadhana >Limited data speaker identi?cation
【24h】

Limited data speaker identi?cation

机译:有限的数据说话人识别

获取原文
           

摘要

In this paper, the task of identifying the speaker using limited training and testing data is addressed. Speaker identi?cation system is viewed as four stages namely, analysis, feature extraction, modelling and testing. The speaker identi?cation performance depends on the techniques employed in these stages. As demonstrated by different experiments, in case of limited training and testing data condition, owing to less data, existing techniques in each stage will not provide good performance. This work demonstrates the following: multiple frame size and rate (MFSR) analysis provides improvement in the analysis stage, combination of mel frequency cepstral coef?cients (MFCC), its temporal derivatives $(Delta,Delta Delta)$, linear prediction residual (LPR) and linear prediction residual phase (LPRP) features provides improvement in the feature extraction stage and combination of learning vector quantization (LVQ) and gaussian mixture model – universal background model (GMM–UBM) provides improvement in the modelling stage. The performance is further improved by integrating the proposed techniques at the respective stages and combining the evidences from them at the testing stage. To achieve this, we propose strength voting (SV), weighted borda count (WBC) and supporting systems (SS) as combining methods at the abstract, rank and measurement levels, respectively. Finally, the proposed hierarchical combination (HC) method integrating these three methods provides signi?cant improvement in the performance. Based on these explorations, this work proposes a scheme for speaker identi?cation under limited training and testing data.
机译:在本文中,解决了使用有限的培训和测试数据来确定说话者的任务。说话人识别系统被视为四个阶段,即分析,特征提取,建模和测试。说话人识别性能取决于这些阶段中使用的技术。正如不同实验所证明的那样,在训练和测试数据条件有限的情况下,由于数据量较少,每个阶段的现有技术都无法提供良好的性能。这项工作演示了以下内容:多帧大小和速率(MFSR)分析在分析阶段提供了改进,梅尔频率倒谱系数(MFCC)的组合,其时间导数$( Delta, Delta Delta)$,线性预测残差(LPR)和线性预测残差相位(LPRP)特征在特征提取阶段提供了改进,并且学习矢量量化(LVQ)和高斯混合模型的组合–通用背景模型(GMM–UBM)在建模阶段提供了改进。通过在各个阶段集成提议的技术并在测试阶段组合来自它们的证据,可以进一步提高性能。为此,我们建议采用强度投票(SV),加权博达计数(WBC)和支持系统(SS)作为分别在抽象,等级和度量级别上的组合方法。最后,所提出的将这三种方法结合在一起的分层组合(HC)方法可显着提高性能。基于这些探索,这项工作提出了在有限的训练和测试数据下用于说话人识别的方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号