首页> 外文学位 >Nonlinear discriminant analysis based feature dimensionality reduction for automatic speech recognition.
【24h】

Nonlinear discriminant analysis based feature dimensionality reduction for automatic speech recognition.

机译:基于非线性判别分析的特征维数缩减,可实现自动语音识别。

获取原文
获取原文并翻译 | 示例

摘要

Automatic Speech Recognition (ASR) has advanced to the point where state of the art speech recognition algorithms perform reasonably well even for large vocabulary continuous speech recognition in practical environments. Among speech recognition problems, feature extraction, which compresses a speech signal into streams of acoustical feature vectors, has become even more important for ASR since acoustical modeling methods have been well established and language modeling largely depends on the nature of the targeted language. The focus of this dissertation is the determination of effective speech features, where both spectral and temporal variations in speech are captured in a low dimensional representation, for speech recognition tasks.;In this dissertation, a set of spectral-temporal features, namely Discrete Cosine Transform Coefficients (DCTCs) and Discrete Cosine Series Coefficients (DCSCs), is examined for the purpose of capturing both the spectral and temporal variations in speech. Experimental evaluations showed that temporal variations are also of great importance for speech recognition, especially using a long time context.;Additionally, in order to reduce the limitations of the acoustical modeling based on Hidden Markov Models (HMMs), a neural network is utilized as a feature transformer to maximize the discrimination and lessen the correlation of the DCTC/DCSC features. The transformed features lead to a large improvement in the phoneme speech recognition based on the TIMIT database, especially when a small number of states and Gaussian mixtures are used for HMMs.;The neural network feature transforms are viewed as two types of Nonlinear Discriminant Analysis (NLDA) methods for nonlinear dimensionality reduction of speech features since high dimensional features considerably increase computation costs and greatly restrict performance improvement. The first method (NLDA1) uses the final outputs of the network to obtain dimensionality reduced features with the incorporation of the Principal Component Analysis (PCA) processing, while the second one (NLDA2) focuses on the middle layer outputs. The very high phone accuracy obtained with NLDA2 based on TIMIT database was 75.0% using a large number of network training iterations based on state-specific targets.
机译:自动语音识别(ASR)已经发展到了这样的程度,即使在实际环境中,即使对于大词汇量连续语音识别,最新的语音识别算法也能表现良好。在语音识别问题中,将语音信号压缩为声学特征向量流的特征提取对于ASR显得尤为重要,因为声学建模方法已得到很好的建立,并且语言建模很大程度上取决于目标语言的性质。本文的重点是确定有效的语音特征,其中语音的频谱和时间变化均以低维表示形式捕获,用于语音识别任务。本论文提出了一组频谱时间特征,即离散余弦为了捕获语音中的频谱和时间变化,对变换系数(DCTC)和离散余弦序列系数(DCSC)进行了检查。实验评估表明,时间变化对于语音识别也很重要,特别是在长时间使用上下文的情况下。此外,为了减少基于隐马尔可夫模型(HMM)的声学建模的局限性,利用神经网络特征转换器,以最大程度地区分和减少DCTC / DCSC特征的相关性。转换后的特征大大改善了基于TIMIT数据库的音素语音识别,尤其是在将少量状态和高斯混合用于HMM时;神经网络特征转换被视为两种非线性判别分析( NLDA)方法用于降低语音特征的非线性维数,因为高维特征极大地增加了计算成本,并极大地限制了性能的提高。第一种方法(NLDA1)使用网络的最终输出,并结合了主成分分析(PCA)处理来获得降维特征,而第二种方法(NLDA2)则专注于中间层输出。使用大量基于状态特定目标的网络训练迭代,使用基于TIMIT数据库的NLDA2获得的非常高的电话准确性为75.0%。

著录项

  • 作者

    Hu, Hongbing.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 水产、渔业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号