首页> 外文期刊>IEICE Transactions on Information and Systems >Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization
【24h】

Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

机译:基于多层特征码矢量量化的鲁棒说话人识别系统

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents some effective methods for improving the performance of a speaker identification system. Based on the mul-tiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.
机译:本文提出了一些有效的方法来提高说话人识别系统的性能。基于小波变换的多分辨率特性,为了不将噪声失真扩展到整个特征空间上,将输入语音信号分解为各种频率子带。为了捕获声道的特征,计算每个分解过程的低频子带的线性预测倒谱系数(LPCC)。此外,在每个分解过程中,还针对较低频率子带采用了硬阈值技术,以消除噪声干扰的影响。此外,倒谱域特征向量归一化应用于所有计算出的特征,以便在所有声学环境中提供相似的参数统计信息。为了有效利用所有这些多频带语音特征,我们提出了一种改进的矢量量化作为标识符。该模型使用多层概念来消除多频带语音特征之间的干扰,然后使用主成分分析(PCA)方法评估码本,以捕获说话人音素特征的更详细分布。使用KING语音数据库对所提出的方法进行评估,以实现与文本无关的说话人识别。实验结果表明,该方法在纯净噪声和噪声噪声方面均优于矢量量化(VQ)和高斯混合模型(GMM)的全频带LPCC和梅尔频率倒谱系数(MFCC)的识别性能。环境。此外,在低SNR环境中也可以实现令人满意的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号