首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
【24h】

Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition

机译:基于SVM的说话人识别中说话人自适应方法作为特征提取的比较

获取原文
获取原文并翻译 | 示例

摘要

In the last years the speaker recognition field has made extensive use of speaker adaptation techniques. Adaptation allows speaker model parameters to be estimated using less speech data than needed for maximum-likelihood (ML) training. The maximum a posteriori (MAP) and maximum-likelihood linear regression (MLLR) techniques have typically been used for adaptation. Recently, MAP and MLLR adaptation have been incorporated in the feature extraction stage of support vector machine (SVM)-based speaker recognition systems. Two approaches to feature extraction use a SVM to classify either the MAP-adapted Gaussian mean vector parameters (GSV-SVM) or the MLLR transform coefficients (MLLR-SVM). In this paper, we provide an experimental analysis of the GSV-SVM and MLLR-SVM approaches. We largely focus on the latter by exploring constrained and unconstrained transforms and different choices of the acoustic model. A channel-compensated front-end is used to prevent the MLLR transforms to adapt to channel components in the speech data. Additional acoustic models were trained using speaker adaptive training (SAT) to better estimate the speaker MLLR transforms. We provide results on the NIST 2005 and 2006 Speaker Recognition Evaluation (SRE) data and fusion results on the SRE 2006 data. The results show that using the compensated front-end, SAT models and multiple regression classes bring major performance improvements.
机译:在最近几年,说话人识别领域已经广泛使用说话人适应技术。自适应允许使用比最大似然(ML)训练所需的语音数据少的语音数据来估计说话者模型参数。最大后验(MAP)和最大似然线性回归(MLLR)技术通常已用于适应。最近,MAP和MLLR自适应已被纳入基于支持向量机(SVM)的说话者识别系统的特征提取阶段。两种特征提取方法使用SVM对MAP自适应的高斯平均矢量参数(GSV-SVM)或MLLR变换系数(MLLR-SVM)进行分类。在本文中,我们提供了GSV-SVM和MLLR-SVM方法的实验分析。通过探索受约束和不受约束的变换以及声学模型的不同选择,我们主要关注后者。通道补偿的前端用于防止MLLR转换适应语音数据中的通道分量。使用扬声器自适应训练(SAT)对其他声学模型进行了训练,以更好地估计扬声器MLLR变换。我们提供NIST 2005和2006说话者识别评估(SRE)数据的结果,以及SRE 2006数据的融合结果。结果表明,使用补偿的前端,SAT模型和多个回归类可以带来重大的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号