首页> 外文期刊>IEE Proceedings. Part K >Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation
【24h】

Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation

机译:使用隐马尔可夫模型,动态时间扭曲和矢量量化的说话人识别

获取原文
获取原文并翻译 | 示例
           

摘要

The authors evaluate continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisation (VQ) for speaker recognition, emphasising the performance of each model structure across incremental amounts of training data. Text-independent (TI) experiments are performed with VQ and CDHMMs, and text-dependent (TD) experiments are performed with DTW, VQ and CDHMMs. For TI speaker recognition, VQ performs better than an equivalent CDHMM with one training version, but is outperformed by CDHMM when trained with ten training versions. For TD experiments, DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data the performance of each model is indistinguishable. The performance of the TD procedures is consistently superior to TI, which is attributed to subdividing the speaker recognition problem into smaller speaker-word problems. It is also shown that there is a large variation in performance across the different digits, and it is concluded that digit zero is the best digit for speaker discrimination.
机译:作者评估了连续密度隐藏马尔可夫模型(CDHMM),动态时间规整(DTW)和基于失真的矢量量化(VQ)来进行说话人识别,从而强调了每种模型结构在增量训练数据上的性能。使用VQ和CDHMM进行文本无关(TI)实验,使用DTW,VQ和CDHMM进行文本无关(TD)实验。对于TI说话人识别,VQ的性能优于具有一个培训版本的等效CDHMM,但是在经过十个培训版本的培训后,VQ的性能要优于CDHMM。对于TD实验,在稀疏的训练数据量方面,DTW优于VQ和CDHMM,但是随着数据的增加,每种模型的性能都难以区分。 TD程序的性能始终优于TI,这归因于将说话人识别问题细分为更小的说话人单词问题。还表明,不同数字在演奏上有很大的差异,并且得出结论,数字零是说话者辨别的最佳数字。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号