首页> 外文会议>2012 IEEE Workshop on Spoken Language Technology. >Word segmentation through cross-lingual word-to-phoneme alignment
【24h】

Word segmentation through cross-lingual word-to-phoneme alignment

机译:通过跨语言单词到音素对齐进行单词分割

获取原文
获取原文并翻译 | 示例

摘要

We present our new alignment model Model 3P for cross-lingual word-to-phoneme alignment, and show that unsupervised learning of word segmentation is more accurate when information of another language is used. Word segmentation with cross-lingual information is highly relevant to bootstrap pronunciation dictionaries from audio data for Automatic Speech Recognition, bypass the written form in Speech-to-Speech Translation or build the vocabulary of an unseen language, particularly in the context of under-resourced languages. Using Model 3P for the alignment between English words and Spanish phonemes outperforms a state-of-the-art monolingual word segmentation approach [1] on the BTEC corpus [2] by up to 42% absolute in F-Score on the phoneme level and a GIZA++ alignment based on IBM Model 3 by up to 17%.
机译:我们提出了用于跨语言单词到音素对齐的新对齐模型Model 3P,并表明当使用另一种语言的信息时,无监督学习的单词分割更为准确。具有跨语言信息的分词与自动语音识别的音频数据中的自举发音词典高度相关,可以绕过语音到语音翻译的书面形式,或者构建看不见的语言的词汇,尤其是在资源匮乏的情况下语言。使用Model 3P进行英语单词和西班牙语音素之间的对齐比BTEC语料库[2]上最先进的单语单词切分方法[1]高出F-Score绝对值达42%,基于IBM Model 3的GIZA ++对齐方式最多可提高17%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号