首页> 外文会议>Advances in nonlinear speech processing >Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition
【24h】

Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition

机译:结合梅尔频率倒谱系数和分形维数以进行自动语音识别

获取原文
获取原文并翻译 | 示例

摘要

Hidden Markov Models and Mel Frequency Cepstral Coefficients (MFCC's) are a sort of standard for Automatic Speech Recognition (ASR) systems, but they fail to capture the nonlinear dynamics of speech that are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful when training data is scarce, or when the ASR task is very complex. In this work, the Fractal Dimension (FD) of the observed time series is combined with the traditional MFCC's in the feature vector in order to enhance the performance of two different ASR systems: the first one is a very simple one, with very few training examples, and the second one is a Large Vocabulary Continuous Speech Recognition System for Broadcast News.
机译:隐马尔可夫模型和梅尔频率倒谱系数(MFCC)是自动语音识别(ASR)系统的一种标准,但是它们无法捕获语音波形中存在的非线性语音动态。当训练数据稀少或ASR任务非常复杂时,非线性功能提供的额外信息可能特别有用。在这项工作中,将观测到的时间序列的分形维数(FD)与特征向量中的传统MFCC相结合,以增强两种不同的ASR系统的性能:第一个是非常简单的系统,只需很少的训练例子,第二个是广播新闻的大词汇量连续语音识别系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号