首页> 外文期刊>IEICE Transactions on Information and Systems >A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition
【24h】

A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

机译:非母语语音识别的混合声学模型和语音模型自适应方法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
机译:在本文中,我们提出了一种混合模型自适应方法,其中语音和声学模型通过结合非本地语音的语音和声学变化来进行自适应,以提高非本地自动语音识别(ASR)的性能。具体而言,可以在状态绑定或三音机建模级别上执行建议的混合模型自适应,具体取决于执行哪种声学模型自适应。在这两种方法中,我们首先分析非母语使用者的发音变体规则,然后将每个规则分类为发音变体或声学变体。然后,状态绑定级混合方法通过在发音词典中容纳发音变体并通过分别使用声学变体对三音机声学模型的状态进行聚类来调整语音模型和声学模型。另一方面,三音模拟水平混合方法最初以与状态绑定水平混合方法相同的方式来适配语音模型。然而,对于声学模型适应,然后基于适配的发音模型来重新估计三音器声学模型,并且使用声学变体对重新估计的三音器声学模型的状态进行聚类。从韩语口语语音识别实验中可以看出,采用状态绑定和三音单元建模水平自适应方法的ASR系统可以将非母语语音的平均单词错误率(WER)分别降低17.1%和22.1%与基准ASR系统相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号