首页> 外文学位 >Nonlinear discriminant analysis based feature dimensionality reduction for automatic speech recognition.

【24h】

Nonlinear discriminant analysis based feature dimensionality reduction for automatic speech recognition.

机译：基于非线性判别分析的特征维数缩减，可实现自动语音识别。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic Speech Recognition (ASR) has advanced to the point where state of the art speech recognition algorithms perform reasonably well even for large vocabulary continuous speech recognition in practical environments. Among speech recognition problems, feature extraction, which compresses a speech signal into streams of acoustical feature vectors, has become even more important for ASR since acoustical modeling methods have been well established and language modeling largely depends on the nature of the targeted language. The focus of this dissertation is the determination of effective speech features, where both spectral and temporal variations in speech are captured in a low dimensional representation, for speech recognition tasks.;In this dissertation, a set of spectral-temporal features, namely Discrete Cosine Transform Coefficients (DCTCs) and Discrete Cosine Series Coefficients (DCSCs), is examined for the purpose of capturing both the spectral and temporal variations in speech. Experimental evaluations showed that temporal variations are also of great importance for speech recognition, especially using a long time context.;Additionally, in order to reduce the limitations of the acoustical modeling based on Hidden Markov Models (HMMs), a neural network is utilized as a feature transformer to maximize the discrimination and lessen the correlation of the DCTC/DCSC features. The transformed features lead to a large improvement in the phoneme speech recognition based on the TIMIT database, especially when a small number of states and Gaussian mixtures are used for HMMs.;The neural network feature transforms are viewed as two types of Nonlinear Discriminant Analysis (NLDA) methods for nonlinear dimensionality reduction of speech features since high dimensional features considerably increase computation costs and greatly restrict performance improvement. The first method (NLDA1) uses the final outputs of the network to obtain dimensionality reduced features with the incorporation of the Principal Component Analysis (PCA) processing, while the second one (NLDA2) focuses on the middle layer outputs. The very high phone accuracy obtained with NLDA2 based on TIMIT database was 75.0% using a large number of network training iterations based on state-specific targets.

机译：自动语音识别（ASR）已经发展到了这样的程度，即使在实际环境中，即使对于大词汇量连续语音识别，最新的语音识别算法也能表现良好。在语音识别问题中，将语音信号压缩为声学特征向量流的特征提取对于ASR显得尤为重要，因为声学建模方法已得到很好的建立，并且语言建模很大程度上取决于目标语言的性质。本文的重点是确定有效的语音特征，其中语音的频谱和时间变化均以低维表示形式捕获，用于语音识别任务。本论文提出了一组频谱时间特征，即离散余弦为了捕获语音中的频谱和时间变化，对变换系数（DCTC）和离散余弦序列系数（DCSC）进行了检查。实验评估表明，时间变化对于语音识别也很重要，特别是在长时间使用上下文的情况下。此外，为了减少基于隐马尔可夫模型（HMM）的声学建模的局限性，利用神经网络特征转换器，以最大程度地区分和减少DCTC / DCSC特征的相关性。转换后的特征大大改善了基于TIMIT数据库的音素语音识别，尤其是在将少量状态和高斯混合用于HMM时;神经网络特征转换被视为两种非线性判别分析（ NLDA）方法用于降低语音特征的非线性维数，因为高维特征极大地增加了计算成本，并极大地限制了性能的提高。第一种方法（NLDA1）使用网络的最终输出，并结合了主成分分析（PCA）处理来获得降维特征，而第二种方法（NLDA2）则专注于中间层输出。使用大量基于状态特定目标的网络训练迭代，使用基于TIMIT数据库的NLDA2获得的非常高的电话准确性为75.0％。

著录项

作者
Hu, Hongbing.;
展开▼
作者单位

State University of New York at Binghamton.;

展开▼
授予单位 State University of New York at Binghamton.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2010
页码 145 p.
总页数 145
原文格式 PDF
正文语种 eng
中图分类水产、渔业;
关键词

相似文献

外文文献
中文文献
专利

1. 基于支持向量机和线性判别分析的维数约减方法及其应用 [J] . 杨波南京航空航天大学学报（英文版） . 2009,第004期
2. Feature selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of Alzheimer's disease [J] . Lopez-de-Ipina K., Alonso-Hernandez J. B., Sole-Casals J., Neurocomputing . 2015,第feba20ptab期

机译：基于非线性语音模型的情绪反应自动分析特征选择，适用于阿尔茨海默氏病的诊断
3. Local discriminant preservation projection embedded ensemble learning based dimensionality reduction of speech data of Parkinson's disease [J] . Liu Yuchuan, Li Yongming, Tan Xiaoheng, Biomedical signal processing and control . 2021,第Jana期

机译：基于帕金森病的语音数据的局部判别保存投影嵌入式集合学习的维度减少
4. Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier [J] . Daneshfar Fatemeh, Kabudian Seyed Jahanshah, Neekabadi Abbas Applied Acoustics . 2020,第Sepa期

机译：语音情感识别使用语音信号/光学波形的混合谱 - 韵律特征，基于血管训练的维数减少和高斯椭圆形基函数网络分类器
5. A New Feature Dimensionally Reduction Approach Based on General Tensor Discriminant Analysis in EEG Signal Classification [C] . Nasehi Saadat, Pourghassem Hossein 2011 International Conference on Intelligent Computation and Bio-Medical Instrumentation . 2011

机译：基于广义张量判别分析的脑电信号分类特征降维新方法
6. Discriminant analysis based feature extraction for pattern recognition. [D] . Wu, Wei. 2009

机译：基于判别分析的特征提取用于模式识别。
7. Automatic Computation of Left Ventricular Volume Changes Over a Cardiac Cycle from Echocardiography Images by Nonlinear Dimensionality Reduction [O] . Zahra Alizadeh Sani, Ahmad Shalbaf, Hamid Behnam, 2015

机译：通过非线性降维从超声心动图图像自动计算心动周期中左心室容积的变化
8. Dimension Reduction of Speech Emotion Feature Based on Weighted Linear Discriminant Analysis [O] . Jingjing Yuan, Li Chen, Taiting Fan, 2015

机译：基于加权线性判别分析的语音情绪特征的尺寸减少
9. Salient Feature Identification and Analysis using Kernel-Based Classification Techniques for Synthetic Aperture Radar Automatic Target Recognition. [R] . M. S. Flynn 2014

机译：基于核的分类技术在合成孔径雷达自动目标识别中的突出特征识别与分析。

Nonlinear discriminant analysis based feature dimensionality reduction for automatic speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅