首页> 外文学位 >Pronunciation modeling for spontaneous Mandarin speech recognition.
【24h】

Pronunciation modeling for spontaneous Mandarin speech recognition.

机译:用于自发普通话语音识别的语音建模。

获取原文
获取原文并翻译 | 示例

摘要

The focus in automatic speech recognition (ASR) research has gradually shifted from read speech to spontaneous speech. ASR systems can reach an accuracy of above 90% when evaluated on read speech, but the accuracy of spontaneous speech is much lower. This high error rate is due in part to the poor modeling of pronunciations within spontaneous speech. An analysis of pronunciation variations at the acoustic level reveals that pronunciation variations include both complete changes and partial changes. Complete changes are the replacement of a canonical phoneme by another alternative phone, such as ‘b’ being pronounced as ‘p’. Partial changes are variations within the phoneme and include diacritics, such as nasalization, centralization, voiceless, voiced, etc. Most of the current work in pronunciation modeling attempts to represent pronunciation variations either by alternative phonetic representations or by the concatenation of subphone units at the state level. In this dissertation, we show that partial changes are a lot less clear-cut than previously assumed and cannot be modeled by mere representation in alternate or concatenation of phone units. When partial changes occur, a phone is not completely substituted, deleted or inserted, and the acoustic representation at the phone level is often ambiguous. We suggest that in addition to phonetic representations of pronunciation variations, the ambiguity of acoustic representations caused by partial changes should be taken into account. The acoustic model for spontaneous speech should be different from that of read and planned speech—it should have a strong ability to cover partial changes.; We propose modeling partial changes by combing the pronunciation model with acoustic model at the state level. Based on this pronunciation model, we reconstruct the acoustic model to improve its resolution without sacrificing the model's identity with the goal of accommodating pronunciation variations. The effectiveness of this approach was evaluated on the Hub4NE Mandarin Broadcast News Corpus with different styles of speech. It has been proven that the new pronunciation modeling approach does not help much for pre-planned speech, but it provides a significant gain for spontaneous speech.; To our best knowledge, this dissertation is the first of its kind that systemically investigates both complete changes and partial changes in spontaneous Mandarin speech. The results reported in this dissertation demonstrate that our approaches are both efficient and effective.
机译:自动语音识别(ASR)研究的重点已逐渐从阅读语音转变为自发语音。当对阅读语音进行评估时,ASR系统可以达到90%以上的准确度,但是自发语音的准确度要低得多。较高的错误率部分是由于自发语音中的语音建模不佳所致。对声学水平上的发音变化的分析表明,发音变化既包括完全变化,也包括部分变化。完整的更改是用另一种替代电话代替了规范音素,例如“ b”发音为“ p”。部分变化是音素内部的变体,包括变音符号,如鼻音,集中化,无声,有声等。当前在语音建模中的大多数工作都试图通过替代的语音表示或通过子音单元的串联来表示发音变化。状态级别。在本文中,我们证明了部分更改比以前假设的清晰得多,并且不能仅通过电话单元的交替或串联表示来建模。当发生部分更改时,电话不会被完全替换,删除或插入,并且电话级别的声音表示通常不明确。我们建议,除了语音变化的语音表示外,还应考虑由局部变化引起的声学表示的歧义。自发语音的声学模型应该不同于阅读语音和计划语音的声学模型,它应该具有较强的覆盖部分变化的能力。我们建议通过在状态级别将发音模型与声学模型相结合来对局部变化建模。基于此语音模型,我们重构了声学模型以提高其分辨率,同时又不牺牲模型的身份,以适应语音变化。在Hub4NE普通话广播新闻语料库上以不同的演讲风格评估了这种方法的有效性。业已证明,新的语音建模方法对预先计划的语音没有太大帮助,但为自发语音提供了很大的收益。据我们所知,本文是系统研究自发普通话完全变化和部分变化的第一篇论文。论文报道的结果表明,我们的方法是有效的。

著录项

  • 作者

    Liu, Yi.;

  • 作者单位

    Hong Kong University of Science and Technology (People's Republic of China).;

  • 授予单位 Hong Kong University of Science and Technology (People's Republic of China).;
  • 学科 Computer Science.; Language Linguistics.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 181 p.
  • 总页数 181
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;语言学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号