首页> 外文会议>International Conference on Signal Processing and Communication Systems >Feature Extraction from Temporal Phase for Speaker Recognition
【24h】

Feature Extraction from Temporal Phase for Speaker Recognition

机译:从时间相中提取特征以进行说话人识别

获取原文

摘要

Feature extraction is important for pattern recognition problems in speech research. Most of the methods of feature extraction primarily exploit spectral information than the phase information. Even though phase is an important characteristics of the speech signal, its use is not much exploited. In this work, in addition to state-of-the-art Mel Frequency Cepstral Coefficients (MFCC), we use features derived from the temporal phase (i.e., T-Phase) of the speech signal for speaker recognition application. The proposed method extracts Linear Prediction Coefficients (LPC) from T-Phase of the speech signal at the frame-level. Experiments are carried on standard NIST 2002 Speaker Recognition Evaluation (SRE) using standard Gaussian Mixture Model - Universal Background Model (GMM-UBM) system. It is observed that the score-level fusion of MFCC and T-Phase feature sets gives 76.18 % identification rate which is a 4% and 8% improvement than MFCC and LPC alone, respectively. In addition, experiments show that score-level fusion reduces the % Equal Error Rate (EER) by 2% and 4% than MFCC and LPC alone, respectively.
机译:特征提取对于语音研究中的模式识别问题很重要。大多数特征提取方法主要利用频谱信息而不是相位信息。尽管相位是语音信号的重要特征,但对其的使用却很少。在这项工作中,除了最新的梅尔频率倒谱系数(MFCC)外,我们还将语音信号的时间相位(即T相)导出的特征用于说话人识别应用。所提出的方法从语音信号的帧相位的T相中提取线性预测系数(LPC)。使用标准的高斯混合模型-通用背景模型(GMM-UBM)系统在标准的NIST 2002说话者识别评估(SRE)上进行实验。可以看出,MFCC和T-Phase特征集的得分级融合给出了76.18%的识别率,分别比单独的MFCC和LPC提升了4%和8%。此外,实验表明,与单独的MFCC和LPC相比,评分级别的融合分别将%均等错误率(EER)降低了2%和4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号