首页> 外文OA文献 >Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based mel frequency cepstral coefficients and fuzzy vector quantization
【2h】

Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based mel frequency cepstral coefficients and fuzzy vector quantization

机译:基于分布式离散余弦变换的梅尔频率倒谱系数和模糊矢量量化自动说话人识别动态特征识别与分类

摘要

The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction and current research aims to identify performance enhancements. In this thesis, a novel approach for MFCC feature extraction and classification is presented and used for speaker recognition. In this research, a new MFCC feature extraction method based on distributed Discrete Cosine Transform (DCT-II) is presented. The proposed feature extraction method applies the DCT-II technique to compute the dynamic features used during speaker recognition. The new algorithm incorporates the DCT-II based MFCC feature extraction method and a Fuzzy Vector Quantization (FVQ) data clustering classifier. The proposed automatic speaker recognition algorithm utilises a recently introduced variation of MFCC known as Delta-Delta MFCC (DDMFCC) to identify the dynamic features that are used for speaker recognition. A series of experiments were performed utilising three different feature extraction methods: (1) conventional MFCC; (2) DDMFCC; and (3) DCT-II based DDMFCC. The experiments were then expanded to include four data clustering classifiers including: (1) K-means Vector Quantization; (2) Linde Buzo Gray Vector Quantization; (3) FVQ; and (4) Gaussian Mixture Model. The National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE 04) corpora was used to provide speaker source data for the experiments. The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate (EER) for the vector quantization based classifiers. The speaker verification tests highlighted the overall improvement in performance for the new ASR system.
机译:梅尔频率倒谱系数(MFCC)特征提取方法是语音特征提取的领先方法,当前的研究旨在识别性能增强。本文提出了一种新的MFCC特征提取和分类方法,并将其用于说话人识别。本研究提出了一种新的基于分布式离散余弦变换(DCT-II)的MFCC特征提取方法。提出的特征提取方法应用DCT-II技术来计算说话人识别期间使用的动态特征。新算法结合了基于DCT-II的MFCC特征提取方法和模糊矢量量化(FVQ)数据聚类分类器。提出的自动说话人识别算法利用了最近引入的MFCC变体,称为Delta-Delta MFCC(DDMFCC)来识别用于说话人识别的动态特征。利用三种不同的特征提取方法进行了一系列实验:(1)常规MFCC; (2)DDMFCC; (3)基于DCT-II的DDMFCC。然后将实验扩展到包括四个数据聚类分类器,包括:(1)K-均值向量量化; (2)Linde Buzo灰色向量量化; (3)FVQ; (4)高斯混合模型。美国国家标准技术研究院(NIST)说话者识别评估(SRE 04)语料库用于为实验提供说话者源数据。对于基于矢量量化的分类器,发现基于DCT-II的MFCC,DMFCC和DDMFCC与FVQ的组合具有最低的均等错误率(EER)。演讲者验证测试强调了新ASR系统在性能方面的总体改进。

著录项

  • 作者

    Hossan M;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号