首页> 外文会议>INTERSPEECH 2012 >Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation
【24h】

Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation

机译:组合声学数据驱动的G2P和资源lexicon生成的字母到声音规则

获取原文

摘要

In a recent work, we proposed an acoustic data-driven grapheme-to-phoneme (G2P) conversion approach, where the probabilistic relationship between graphemes and phonemes learned through acoustic data is used along with the orthographic transcription of words to infer the phoneme sequence. In this paper, we extend our studies to under-resourced lexicon development problem. More precisely, given a small amount of transcribed speech data consisting of few words along with its pronunciation lexicon, the goal is to build a pronunciation lexicon for unseen words. In this framework, we compare our G2P approach with standard letter-to-sound (L2S) rule based conversion approach. We evaluated the generated lexicons on PhoneBook 600 words task in terms of pronunciation errors and ASR performance. The G2P approach yields a best ASR performance of 14.0% word error rate (WER), while L2S approach yields a best ASR performance of 13.7% WER. A combination of G2P approach and L2S approach yields a best ASR performance of 9.3% WER.
机译:在最近的工作中,我们提出了一种声学数据驱动的标记到音素(G2P)转换方法,其中通过声学数据学习的图形和音素之间的概率关系与单词的正交转录一起推断音素序列。在本文中,我们将我们的研究扩展到资源不足的词典发育问题。更确切地说,给定少量转录的语音数据包括几个单词以及它的发音词典,目标是为未经语言构建一个发音词典。在此框架中,我们将G2P方法与基于标准的字母到声音(L2S)规则的转换方法进行比较。我们在电话簿上的生成词典评估了600字任务的发音错误和ASR性能。 G2P方法产生14.0%字的误差率(WER)的最佳ASR性能,而L2S方法会产生13.7%WER的最佳ASR性能。 G2P方法和L2S方法的组合产生了最佳的ASR性能为9.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号