首页> 外文期刊>Computer speech and language >Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
【24h】

Modeling partial pronunciation variations for spontaneous Mandarin speech recognition

机译:为自发普通话语音识别建模部分发音变化

获取原文
获取原文并翻译 | 示例
           

摘要

The high error rate in spontaneous speech recognition is due in part to the poor modeling of pronunciation variations. An analysis of acoustic data reveals that pronunciation variations include both complete changes and partial changes. complete changes are the replacement of a canonical phoneme by another alternative phone, such as 'b' being pronounced as 'p'. Partial changes are the variations within the phoneme, such as nasalization, centralization, voiceless, voiced, etc. Most current work in pronunciation modeling attempts to represent pronunciation, voiceless, voiced, etc. Most current work in pronunciation modeling attempts to represent pronunciation variations either by alternative phonetic representations or by the concatenation of subphone units at the hidden Markov state level. In this paper, we show that partial changes are a lot less clear-cut than previously assumed and cannot be modeled by mere representation by alternate phones or a concatenation of phone units. We propose modeling partial changes through acoustic model reconstruction. We first propose a partial change phone model (PCPM) to differentiate pronunciation variations. In order to improve the model resolution without increasing the parameter size too much, PCPM is used as a hidden model and merged into the pre-trained acoustic model though model reconstruction. To avoid model confusion, auxiliary decision trees are established for PCPM triphones, and one auxiliary decision tree can only be used by one standard decision tree. The acoustic model reconstruction on triphones is equivalent to decision tree merging. The effectiveness of this approach is evaluated on the 1997 Hub4NE Mandarin Broadcast News corpus (1997 MBN) with different styles of speech. It gives a significant 2.39% syllable error rate absolute reduction in spontaneous speech.
机译:自发语音识别中的高错误率部分是由于语音变化的建模不佳所致。对声学数据的分析表明,语音变化既包括完全变化,也包括部分变化。完整的更改是用另一个替代电话替换了规范音素,例如“ b”发音为“ p”。部分变化是音素内的变化,例如鼻音化,集中化,无声,有声等。当前在语音建模中的大多数工作试图表示发音,无声,有声等。大多数在语音建模中的当前工作试图表示发音变化通过替代的语音表示或在隐马尔可夫状态级的子电话单元的串联。在本文中,我们表明,部分更改比以前假设的要清晰得多,并且不能仅通过备用电话或电话单元的串联来建模。我们建议通过声学模型重建对局部变化建模。我们首先提出一种部分更改电话模型(PCPM),以区分发音变化。为了在不增加参数大小的情况下提高模型分辨率,将PCPM用作隐藏模型,并通过模型重构将其合并到预训练声学模型中。为避免模型混淆,为PCPM三音器建立了辅助决策树,并且一个辅助决策树只能由一个标准决策树使用。三音器上的声学模型重建等效于决策树合并。该方法的效果是在1997 Hub4NE普通话广播新闻语料库(1997 MBN)上以不同的演讲风格进行评估的。自发语音的绝对错误率明显降低了2.39%。

著录项

  • 来源
    《Computer speech and language》 |2003年第4期|p. 357-379|共23页
  • 作者

    Yi Liu; Pascale Fung;

  • 作者单位

    Human Language Technology Center, Department of Electrical and Electronic Engineering, University of Science and Technology, Clear Water Bay, Hong Kong;

    Human Language Technology Center, Department of Electrical and Electronic Engineering, University of Science and Technology, Clear Water Bay, Hong Kong;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号