首页> 外文学位 >Computations and evaluations of an optimal feature-set for an HMM-based recognizer.
【24h】

Computations and evaluations of an optimal feature-set for an HMM-based recognizer.

机译:基于HMM的识别器的最佳功能集的计算和评估。

获取原文
获取原文并翻译 | 示例

摘要

The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal feature-sets for speech recognition. The objectives for an optimal feature-set are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: (1) the amount of resources (time and computations) required to evaluate the performance of a feature-set, (2) the size of the feature space, and (3) the dependence of features upon some words in the vocabulary.;This thesis proposes solutions to all three problems. The evaluation problem is addressed by designing an advanced architecture. The architecture reconfigures itself for fast computation based on the source code and takes advantage of the structure of the semi-continuous hidden Markov model computations. This thesis demonstrates how an inexpensive reconfigurable system outperforms a fast general purpose computer. The feature space problem is addressed by investigating discrete Fourier transform (DFT) based feature-sets. Two parameters are used to control the spectral compression of the features. The parameterized feature-set with a mel-scale compression are shown to be superior. The parameterized system decreased the error rate of the standard mel-cepstrum LPC system by over 21% to 8.2%. Recognition performance of all highly confusable sets were improved. The DFT-based signal processing increased error rates for confusion of voiced-to-unvoiced stops, but made a good distinction of the place-of-articulation. The decreased error rates on the nasals were expected since the LPC models the spectral zeros poorly. To improve on performance of specific words in the vocabulary, the small but difficult nasal-set is investigated. A hierarchical method improved the performance of the set.;This thesis, perhaps for the first time, has shown that mel-scale compression of the human-auditory system is also ideal for machine speech recognition. The reconfigurable architecture will enable further investigations of complex parameterization of the feature space.
机译:语音识别机器的好处将是很多的,从而改善了人们的生活质量。语音识别系统的设计可以分为两个部分,通常称为前端和后端。前端处理模拟语音信号到分类功能的转换。本文研究了语音识别的最佳特征集。最佳功能集的目标是提高识别性能,噪声鲁棒性,说话者不敏感度和效率。难以找到最佳特征的三个问题是:(1)评估特征集性能所需的资源量(时间和计算),(2)特征空间的大小,以及(3)特征依赖于词汇中的某些单词。本论文针对这三个问题提出了解决方案。通过设计高级体系结构可以解决评估问题。该体系结构根据源代码重新配置以进行快速计算,并利用了半连续隐马尔可夫模型计算的结构。本文证明了廉价的可重构系统如何胜过快速通用计算机。通过研究基于离散傅立叶变换(DFT)的特征集来解决特征空间问题。使用两个参数来控制特征的光谱压缩。具有梅尔比例压缩的参数化特征集被证明是优越的。参数化的系统将标准的倒谱LPC系统的错误率降低了21%以上至8.2%。所有高度易混淆集的识别性能均得到改善。基于DFT的信号处理提高了发声到发声的音位混淆的错误率,但是很好地区分了发音位置。由于LPC对光谱零点的建模很差,因此预计鼻腔的错误率会降低。为了提高词汇表中特定单词的性能,对小而难的鼻形进行了研究。分层方法提高了集合的性能。本论文也许是第一次,表明人类听觉系统的梅尔级压缩对于机器语音识别也是理想的。可重新配置的体系结构将使进一步研究特征空间的复杂参数化成为可能。

著录项

  • 作者

    Mashao, Daniel Johannes.;

  • 作者单位

    Brown University.;

  • 授予单位 Brown University.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 1996
  • 页码 148 p.
  • 总页数 148
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号