首页> 外文会议>Annual conference of the International Speech Communication Association >Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition
【24h】

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

机译:用于语音识别的增强型Polyphone决策树自适应

获取原文

摘要

State-of-the-art Automatic Speech Recognition (ASR) systems struggle to handle accented speech, particularly if the target accent is under-represented in the training data. The acoustic variations presented by an unfamiliar accent render the ASR polyphone decision tree (PDT) and its associated Gaussian mixture models (GMM) misfit to the test data. In this paper, we improve on the previous work of adapting the polyphone decision tree, using a semi-continuous model based approach to address the problem of data sparsity. We extend the existing PDT to introduce additional states with shared parameters, corresponding to the new contextual variations identified in the adaptation data, while still robustly estimating the state-specific parameters on a relatively small dataset. We conduct ASR experiments on Arabic and English accents and show that our technique performs better than Maximum A-Posteriori (MAP) adaptation and a previous implementation of polyphone decision tree specialization (PDTS). Compared to MAP adapted system, we obtain 7% relative improvement in Word Error Rate (WER) for Arabic and 13.7% relative improvement for English accent adaptation.
机译:最先进的自动语音识别(ASR)系统难以处理重音,特别是如果目标重音在训练数据中不足的情况下。陌生的口音带来的声学变化使ASR复音器决策树(PDT)及其相关的高斯混合模型(GMM)与测试数据不匹配。在本文中,我们使用基于半连续模型的方法来解决数据稀疏性问题,从而改进了适应多音素决策树的先前工作。我们扩展了现有的PDT,以引入具有共享参数的其他状态,这些状态与适应数据中标识的新上下文变化相对应,同时仍能在相对较小的数据集上稳健地估计特定于状态的参数。我们对阿拉伯语和英语的口音进行了ASR实验,结果表明我们的技术比“最大A后验(MAP)自适应”和以前的多音素决策树专业化(PDTS)实施效果更好。与MAP适应系统相比,阿拉伯语的单词错误率(WER)相对提高了7%,英语口音适应性得到了13.7%的相对提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号