首页> 外文会议>IASTED international conference on artificial intelligence and applications >OPTIMIZING MULTIPLE PRONUNCIATION DICTIONARY BASED ON A CONFUSABILITY MEASURE FOR NON-NATIVE SPEECH RECOGNITION
【24h】

OPTIMIZING MULTIPLE PRONUNCIATION DICTIONARY BASED ON A CONFUSABILITY MEASURE FOR NON-NATIVE SPEECH RECOGNITION

机译:基于非本机语音识别的可混合性测量,优化多个发音词典

获取原文

摘要

This paper addresses issues associated with an efficient pronunciation variation modeling for non-native automatic speech recognition (ASR), where non-native speech is mostly characterized by different pronunciation from native speech. In order to improve the performance of non-native ASR, a multiple pronunciation dictionary using an indirect data-driven approach is first proposed. However, this approach results in an increased search space for ASR decoding due to the increase of the dictionary size. Therefore, we propose a method for optimizing the size of the multiple pronunciation dictionary by removing some confusable pronunciation variants in the dictionary. To this end, a confusability measure is also proposed here based on the Levenshtein distance between two different pronunciation variants. In addition, the number of phonemes for each pronunciation variant is used to optimize the dictionary size. To investigate the effect of the proposed approach on ASR performance, English is selected as a target language and English utterances spoken by Koreans are considered as non-native speech. It is shown from the continuous non-native ASR experiments that the ASR system using the optimized multiple pronunciation dictionary can achieve the average word error rate reduction by 13.53% with less computational complexity by 21.10% relatively, compared with that using the multiple pronunciation dictionary without optimization.
机译:本文解决了与非本机自动语音识别(ASR)的有效发音变化建模相关的问题,其中非本机语音主要是由来自本机语音不同的发音。为了提高非本机ASR的性能,首先提出使用间接数据驱动方法的多个发音词典。然而,由于字典大小的增加,这种方法导致ASR解码的搜索空间增加。因此,我们提出了一种用于通过删除字典中的一些可变的发音变量来优化多个发音词典的大小。为此,此处还基于两个不同的发音变体之间的Levenshtein距离提出了可混淆的测量。另外,每个发音变量的音素数用于优化字典大小。为了调查所提出的方法对ASR性能的影响,选择英语作为目标语言,韩国人所说的英语话语被视为非原生演讲。从连续的非本机ASR实验中显示,使用优化的多个发音词典的ASR系统可以实现13.53%的平均字错误率,而使用多个发音字典的计算复杂度较少的计算复杂性相对较少,而没有优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号