...
首页> 外文期刊>Digital Signal Processing >A fast and scalable hybrid FA/PPCA-based framework for speaker recognition
【24h】

A fast and scalable hybrid FA/PPCA-based framework for speaker recognition

机译:快速,可扩展的基于FA / PPCA的混合框架,用于说话人识别

获取原文
获取原文并翻译 | 示例

摘要

A text-independent speaker recognition system using a hybrid Probabilistic Principal Component Analysis (PPCA) and conventional i-vector modelingtechnique is proposed. In this framework, the total variability space (TVS) is estimated using PPCA while the i-vectors of target speakers and test utterances are extracted using the conventional method. This leads to appreciable decrease in development time, while the time required for training and testing remains unchanged. In this a paper, an algorithmic optimization to the PPCA’s EM algorithm is developed. This is observed to provide a speed up of 3.7×. To simplify the testing procedure, two different approximation procedures are proposed to be used in this framework. The first approximation assumes a covariance matrix computed based on the PPCA framework. Thesecond approximation proposes an optimization to avoid inverting the precision matrix of the i-vector. The comparison of time taken by these approximations with the baseline i-vector extraction procedure showsspeed gains with some deterioration in performance in terms of the Equal Error Rate(EER). Among the proposed techniques, a best case trade-off is obtained with a speed up of 81.2× with deterioration in performance by0.7%in absolute terms. Speaker recognition performances are studied on the telephone conditions of the benchmark NIST SRE 2010 dataset with systems built on the Mel Frequency Cepstral Co-efficient (MFCC) feature. A trade-off in the performance is observed when the proposed approximations are used. The scalability of these trade-offs istested on the Mel Filterbank Slope (MFS) feature. The trade-offs observed with the approximations are reduced when the two systems are fused.
机译:提出了一种使用混合概率主成分分析(PPCA)和传统的i矢量建模技术的文本无关的说话人识别系统。在此框架中,使用PPCA估计总可变性空间(TVS),而使用常规方法提取目标说话者的i矢量和测试话语。这导致开发时间显着减少,而培训和测试所需的时间保持不变。本文针对PPCA的EM算法进行了算法优化。观察到这提供了3.7倍的加速。为了简化测试程序,建议在此框架中使用两种不同的近似程序。第一近似假定基于PPCA框架计算的协方差矩阵。第二种近似提出了一种优化方案,以避免对i向量的精度矩阵求逆。这些近似值所花费的时间与基线i向量提取过程的比较表明,在等错误率(EER)方面,速度获得了提升,但性能却有所下降。在所提出的技术中,以81.2倍的速度获得了最佳情况的权衡,而性能绝对值下降了0.7%。使用基于梅尔频率倒谱系数(MFCC)功能构建的系统,在基准NIST SRE 2010数据集的电话条件下研究说话者识别性能。当使用建议的近似值时,会观察到性能的折衷。这些权衡的可伸缩性已在Mel Filterbank斜率(MFS)功能上进行了测试。当两个系统融合时,在近似中观察到的取舍减少了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号