...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training
【24h】

A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training

机译:计算机辅助语音训练的双误检测和诊断的两阶段框架

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a two-pass framework with discriminative acoustic modeling for mispronunciation detection and diagnoses (MD&D). The first pass of mispronunciation detection does not require explicit phonetic error pattern modeling. The framework instantiates a set of antiphones and a filler model to augment the original phone model for each canonical phone. This guarantees full coverage of all possible error patterns while maximally exploiting the phonetic information derived from the text prompt. The antiphones can be used to detect substitutions. The filler model can detect insertions, and phone skips are allowed to detect deletions. As such, there is no prior assumption on the possible error patterns that can occur. The second pass of mispronunciation diagnosis expands the detected insertions and substitutions into phone networks, and another recognition pass attempts to reveal the phonetic identities of the detected mispronunciation errors. Discriminative training (DT) is applied respectively to the acoustic models of the mispronunciation detection pass and the mispronunciation diagnosis pass. DT effectively separates the acoustic models of the canonical phones and the antiphones. Overall, with DT in both passes of MD&D, the error rate is reduced by 40.4% relative, compared with the maximum likelihood baseline. After DT, the error rates of the respective passes are also lower than those of a strong single-pass baseline with DT by 1.3% and 5.1% relative which are statistically significant.
机译:本文提出了具有判别声学模型的两遍框架,用于错误发音检测和诊断(MD&D)。错误发音检测的第一遍不需要显式的语音错误模式建模。该框架实例化了一组反电话和填充模型,以增强每个规范电话的原始电话模型。这样可以最大程度地利用从文本提示中获得的语​​音信息,从而完全覆盖所有可能的错误模式。消音器可用于检测替代。填充器模型可以检测到插入,并且允许电话跳过来检测删除。因此,对于可能发生的错误模式没有事先的假设。错误发音诊断的第二遍将检测到的插入和替换扩展到电话网络中,并且另一遍识别尝试尝试揭示检测到的错误发音错误的语音标识。区分训练(DT)分别应用于错读检测遍和错读诊断遍的声学模型。 DT有效地分离了规范电话和反电话的声学模型。总体而言,与最大似然基准相比,在MD&D的两次通过中均使用DT,相对的错误率降低了40.4%。 DT之后,各遍的错误率也比具有DT的强单遍基线的错误率低1.3%和5.1%,这在统计上是显着的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号