首页> 外文学位 >Classification and recognition of speech under perceptual stress using neural networks and N-D HMMs.
【24h】

Classification and recognition of speech under perceptual stress using neural networks and N-D HMMs.

机译:使用神经网络和N-D HMM在感知压力下对语音进行分类和识别。

获取原文
获取原文并翻译 | 示例

摘要

The primary contribution of this study is the formulation of a stress classification algorithm. The secondary contribution is the formulation of a multi-dimensional hidden Markov model (N-D HMM) for unified stressed speech classification and recognition. Perceptually induced stress affects a speaker's intention to produce speech due to the presence of emotion, environmental noise (i.e., Lombard effect), or actual task workload. Analysis of articulatory, excitation, and cepstral based features is conducted using a previously established stressed speech database (SUSAS). Targeted feature sets are selected across ten stress conditions (Apache helicopter, Angry, Clear, Fast, Lombard effect, Loud, Slow, Soft, and two workload tasks). Four stress classification approaches are formulated using both neural network and hidden Markov model based systems. Stress classification rates for the neural network based mono-partition non-targeted feature and tri-partition targeted feature algorithms are 56.68% (5 words, 1 speaker) and 91.01% (35 words, 11 speakers) across ten stress conditions for specific application scenarios. The stress classification rate for both the 1-D and N-D HMM across Neutral, Angry, Clear and Lombard effect speech is 57.6%, with the N-D model yielding greater stress score separation. Stress directed speaker independent speech recognition is shown to improve performance over Neutral and multi-style trained speech recognizers by +10.95% and +15.43%. Finally, the N-D HMM is used to unify the stress classification and stress dependent speech recognition tasks. The N-D HMM structure is derived from Markov Random Field theory enabling an explicit sub-phoneme stress classification at the state level. This formulation better integrates perceptually induced stress effects. An improvement of +15.72% is observed for the N-D HMM at 94.41% over the 1-D HMM based stress directed speech recognition system. This is +26.67% better than the Neutral trained 1-D HMM which has a recognition rate of 67.74%. It is suggested that the developed stress classification algorithms are applicable to other speech under stress environments, yielding significant performance gains in speech processing systems due to the incorporation of speaker stress effects.
机译:这项研究的主要贡献是制定了压力分类算法。第二个贡献是制定了用于统一强调语音分类和识别的多维隐马尔可夫模型(N-D HMM)。由于存在情绪,环境噪声(即伦巴德效应)或实际任务工作量,感知诱发的压力会影响说话者表达语音的意图。使用先前建立的压力语音数据库(SUSAS)进行基于发音,兴奋和倒谱特征的分析。在十种压力条件下选择目标功能集(Apache直升机,愤怒,清晰,快速,伦巴第特效,大声,慢速,柔和和两个工作负载任务)。使用神经网络和基于隐马尔可夫模型的系统制定了四种应力分类方法。在特定应用场景下的十种压力条件下,基于神经网络的单分区非目标特征和三分区目标特征算法的应力分类率为56.68%(5个单词,1个说话者)和91.01%(35个单词,11个说话者) 。在中性,愤怒,清晰和伦巴第效果语音中,一维和五维HMM的压力分类率为57.6%,而N维模型产生更大的应力分值分离。研究表明,与压力无关的说话者独立语音识别能力比中立和多种训练的语音识别器提高+ 10.95%和+ 15.43%。最后,N-D HMM用于统一压力分类和压力相关的语音识别任务。 N-D HMM结构是从马尔可夫随机场理论派生而来的,能够在状态级别进行明确的子音素应力分类。该配方更好地整合了感官诱发的压力效应。与基于1-D HMM的压力定向语音识别系统相比,N-D HMM的94.41%改善了+ 15.72%。这比中性训练的一维HMM识别率高67.74%,提高了26.67%。建议开发的压力分类算法适用于压力环境下的其他语音,由于结合了说话者压力效应,因此在语音处理系统中可显着提高性能。

著录项

  • 作者

    Womack, Brian David.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Electrical engineering.;Artificial intelligence.;Speech therapy.
  • 学位 Ph.D.
  • 年度 1996
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:49:30

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号