首页> 外文期刊>Sadhana >Limited data speaker identification
【24h】

Limited data speaker identification

机译:说话人识别数据有限

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, the task of identifying the speaker using limited training and testing data is addressed. Speaker identification system is viewed as four stages namely, analysis, feature extraction, modelling and testing. The speaker identification performance depends on the techniques employed in these stages. As demonstrated by different experiments, in case of limited training and testing data condition, owing to less data, existing techniques in each stage will not provide good performance. This work demonstrates the following: multiple frame size and rate (MFSR) analysis provides improvement in the analysis stage, combination of mel frequency cepstral coefficients (MFCC), its temporal derivatives (Δ, ΔΔ), linear prediction residual (LPR) and linear prediction residual phase (LPRP) features provides improvement in the feature extraction stage and combination of learning vector quantization (LVQ) and gaussian mixture model — universal background model (GMM-UBM) provides improvement in the modelling stage. The performance is further improved by integrating the proposed techniques at the respective stages and combining the evidences from them at the testing stage. To achieve this, we propose strength voting (SV), weighted borda count (WBC) and supporting systems (SS) as combining methods at the abstract, rank and measurement levels, respectively. Finally, the proposed hierarchical combination (HC) method integrating these three methods provides significant improvement in the performance. Based on these explorations, thiswork proposes a scheme for speaker identification under limited training and testing data.
机译:在本文中,解决了使用有限的培训和测试数据来确定说话者的任务。说话人识别系统被视为四个阶段,即分析,特征提取,建模和测试。说话人识别性能取决于这些阶段中使用的技术。正如不同实验所证明的那样,在训练和测试数据条件有限的情况下,由于数据量较少,每个阶段的现有技术都无法提供良好的性能。这项工作演示了以下内容:多帧大小和速率(MFSR)分析提供了分析阶段的改进,梅尔频率倒谱系数(MFCC),其时间导数(Δ,ΔΔ),线性预测残差(LPR)和线性预测的组合残余相位(LPRP)功能在特征提取阶段提供了改进,并且学习矢量量化(LVQ)和高斯混合模型的组合-通用背景模型(GMM-UBM)在建模阶段提供了改进。通过在各个阶段集成提议的技术并在测试阶段组合来自它们的证据,可以进一步提高性能。为此,我们提出了强度投票(SV),加权borda计数(WBC)和支持系统(SS)作为分别在抽象,等级和度量级别上的组合方法。最后,将这三种方法结合在一起的提议的层次组合(HC)方法在性能上提供了显着的改进。基于这些探索,这项工作提出了一种在有限的训练和测试数据下识别说话人的方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号