首页> 外文期刊>ACM transactions on Asian language information processing >Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition
【24h】

Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition

机译:使用模型转换和验证进行语音识别的重音模型生成

获取原文
获取原文并翻译 | 示例
           

摘要

Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models are generated from the corresponding Gaussian components of the native speech models using a linear transformation function. A decision tree is constructed to categorize the transformation functions and used for transformation function retrieval to deal with the data sparseness problem. Third, a discrimination function is further applied to verify the generated accented acoustic models. Finally, the successfully verified accented English models are integrated into the native bilingual phone model set for Mandarin-English bilingual speech recognition. Experimental results show that the proposed approach can effectively alleviate recognition performance degradation due to accents and can obtain absolute improvements of 4.1%, 1.8%, and 2.7% in word accuracy for bilingual speech recognition compared to that using traditional ASR approaches, MAP-adapted, and MLLR-adapted ASR methods, respectively.
机译:如今,双语或多语种语音识别面临着由非母语语音在各种实际应用中引起的与口音相关的问题。非母语语音的口音建模绝对具有挑战性,因为由非母语说话者发出的高重音语音的声学特性差异很大。这项研究的目的是为母语为普通话的说话者生成具有高度普通话口音的英语模型。首先,提出了一种基于状态的两阶段验证方法,以自动提取状态级别高的语音片段。声学特征和发音特征相继用于提取的语音片段的鲁棒验证。其次,使用线性变换函数从本地语音模型的相应高斯分量中生成高度重音语音模型的高斯分量。构造决策树以对变换函数进行分类,并用于变换函数检索以处理数据稀疏问题。第三,进一步应用判别函数来验证生成的重音模型。最后,已成功验证的带重音的英语模型已集成到用于普通话-英语双语语音识别的本地双语电话模型集中。实验结果表明,与采用MAP自适应技术的传统ASR方法相比,该方法可以有效缓解由于重音引起的识别性能下降,并且可以实现双语语音识别的单词准确性的绝对改善,分别达到4.1%,1.8%和2.7%。和采用MLLR的ASR方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号