首页> 外文会议>International conference on statistical language and speech processing >Pronunciation Extraction from Phoneme Sequences through Cross-Lingual Word-to-Phoneme Alignment
【24h】

Pronunciation Extraction from Phoneme Sequences through Cross-Lingual Word-to-Phoneme Alignment

机译:通过跨语言单词到音素对齐从音素序列中提取语音

获取原文

摘要

With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.
机译:借助源语言的书面翻译,我们使用新的比对模型Model 3P [17]将目标语言中的音素序列跨语言分割为单词单位。据此,我们推断出目标语言单词的语音转录,以单词ID的形式介绍词汇,并提取发音词典。我们的方法与用于自动语音识别的音频数据自举字典高度相关,并绕过了语音到语音翻译的书面形式,特别是在资源匮乏的语言中,以及根本没有编写的语言。对9种语言的14种翻译进行分析以构建英语词典,结果表明,如果源语言和目标语言的词汇量接近,句子较短,重复的单词较多以及形式相当的翻译,则最终词典的质量会更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号