首页> 外文期刊>Computer speech and language >Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment
【24h】

Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

机译:通过跨语言单词到音素对齐从音素序列中进行单词分割和发音提取

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we study methods to discover words and extract their pronunciations from audio data for non-written and under-resourced languages. We examine the potential and the challenges of pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In our scenario a human translator produces utterances in the (non-written) target language from prompts in a resource-rich source language. We add the resource-rich source language prompts to help the word discovery and pronunciation extraction process. By aligning the source language words to the target language phonemes, we segment the phoneme sequences into word-like chunks. The resulting chunks are interpreted as putative word pronunciations but are very prone to alignment and phoneme recognition errors. Thus we suggest our alignment model Model 3P that is particularly designed for cross-lingual word-to-phoneme alignment. We present two different methods (source word dependent and independent clustering) that extract word pronunciations from word-to-phoneme alignments and compare them. We show that both methods compensate for phoneme recognition and alignment errors. We also extract a parallel corpus consisting of 15 different translations in 10 languages from the Christian Bible to evaluate our alignment model and error recovery methods. For example, based on noisy target language phoneme sequences with 45.1% errors, we build a dictionary for an English Bible with a Spanish Bible translation with 4.5% OOV rate, where 64% of the extracted pronunciations contain no more than one wrong phoneme. Finally, we use the extracted pronunciations in an automatic speech recognition system for the target language and report promising word error rates - given that pronunciation dictionary and language model are learned completely unsupervised and no written form for the target language is required for our approach.
机译:在本文中,我们研究了从非书面和资源匮乏的语言的音频数据中发现单词并提取其发音的方法。我们研究了通过跨语言单词到音素对齐从音素序列中提取语音的潜力和挑战。在我们的场景中,人工翻译会从资源丰富的源语言的提示中以(非书面)目标语言发出语音。我们添加了资源丰富的源语言提示,以帮助单词发现​​和发音提取过程。通过将源语言单词与目标语言音素对齐,我们将音素序列分割为类似单词的块。生成的块被解释为假定的单词发音,但非常容易出现对齐和音素识别错误。因此,我们建议使用专为跨语言单词到音素对齐而设计的对齐模型Model 3P。我们提出了两种不同的方法(源词相关和独立聚类),它们从单词到音素的比对中提取单词发音并进行比较。我们表明两种方法都可以补偿音素识别和对齐错误。我们还从《基督教圣经》中提取了由10种语言的15种不同翻译组成的平行语料库,以评估我们的对齐模型和错误修复方法。例如,基于具有45.1%错误的嘈杂目标语言音素序列,我们为英语圣经构建了词典,其具有OVV率为4.5%的西班牙圣经翻译,其中64%的提取语音中所包含的错误音素不超过一个。最后,我们将提取的发音用于目标语言的自动语音识别系统中,并报告有希望的单词错误率-假设发音字典和语言模型是完全不受监督地学习的,并且我们的方法不需要目标语言的书面形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号