Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

机译：用于语音识别的增强型Polyphone决策树自适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

State-of-the-art Automatic Speech Recognition (ASR) systems struggle to handle accented speech, particularly if the target accent is under-represented in the training data. The acoustic variations presented by an unfamiliar accent render the ASR polyphone decision tree (PDT) and its associated Gaussian mixture models (GMM) misfit to the test data. In this paper, we improve on the previous work of adapting the polyphone decision tree, using a semi-continuous model based approach to address the problem of data sparsity. We extend the existing PDT to introduce additional states with shared parameters, corresponding to the new contextual variations identified in the adaptation data, while still robustly estimating the state-specific parameters on a relatively small dataset. We conduct ASR experiments on Arabic and English accents and show that our technique performs better than Maximum A-Posteriori (MAP) adaptation and a previous implementation of polyphone decision tree specialization (PDTS). Compared to MAP adapted system, we obtain 7% relative improvement in Word Error Rate (WER) for Arabic and 13.7% relative improvement for English accent adaptation.

机译：最先进的自动语音识别（ASR）系统难以处理重音，特别是如果目标重音在训练数据中不足的情况下。陌生的口音带来的声学变化使ASR复音器决策树（PDT）及其相关的高斯混合模型（GMM）与测试数据不匹配。在本文中，我们使用基于半连续模型的方法来解决数据稀疏性问题，从而改进了适应多音素决策树的先前工作。我们扩展了现有的PDT，以引入具有共享参数的其他状态，这些状态与适应数据中标识的新上下文变化相对应，同时仍能在相对较小的数据集上稳健地估计特定于状态的参数。我们对阿拉伯语和英语的口音进行了ASR实验，结果表明我们的技术比“最大A后验（MAP）自适应”和以前的多音素决策树专业化（PDTS）实施效果更好。与MAP适应系统相比，阿拉伯语的单词错误率（WER）相对提高了7％，英语口音适应性得到了13.7％的相对提高。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|1900-1903|共4页
会议地点
作者
Udhyakumar Nallasamy; Florian Metze; Tanja Schultz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
automatic speech recognition; accent adaptation;

机译：自动语音识别;口音适应;

相似文献

外文文献
中文文献
专利

1. CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition [J] . Jiangyan Yi, Zhengqi Wen, Jianhua Tao, Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：CTC正则化模型自适应，用于改进基于LSTM RNN的多口音普通话语音识别
2. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation [J] . Briony Banks, Emma Gowen, Kevin J. Munro, Frontiers in Human Neuroscience . 2015,第3期

机译：视听提示有助于识别噪声中的重音，但不适用于感知适应
3. On adaptive decision rules and decision parameter adaptation for automatic speech recognition [J] . Chin-Hui Lee, Qiang Huo Proceedings of the IEEE . 2000,第8期

机译：用于自动语音识别的自适应决策规则和决策参数自适应
4. Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition [C] . Udhyakumar Nallasamy, Florian Metze, Tanja Schultz INTERSPEECH 2012 . 2012

机译：增强的Polyphone决策树适应重音语音识别
5. Decision-tree probability modeling for HMM speech recognition. [D] . Foote, Jonathan Trumbull. 1994

机译：HMM语音识别的决策树概率建模。
6. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation [O] . Briony Banks, Emma Gowen, Kevin J. Munro, 2015

机译：视听提示有助于识别噪声中的重音但不能感知适应
7. Speech accent identification and speech recognition enhancement by speaker accent adaptation [O] . Mohammad Tanabian -1

机译：扬声器口音适配的语音口音识别和语音识别增强

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅