首页> 外文会议>Workshop on deep learing approaches for low-resource natural language processing >Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages
【24h】

Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

机译:用于低资源语言的多模式,多语言音素到音素转换

获取原文

摘要

Grapheme-to-phoneme conversion (g2p) is the task of predicting the pronunciation of words from their orthographic representation. Historically, g2p systems were transition- or rule-based, making generalization beyond a monolingual (high resource) domain impractical. Recently, neural architectures have enabled multilingual systems to generalize widely; however, all systems to date have been trained only on spelling-pronunciation pairs. We hypothesize that the sequences of IPA characters used to represent pronunciation do not capture its full nuance, especially when cleaned to facilitate machine learning. We leverage audio data as an auxiliary modality in a multi-task training process to learn a more optimal intermediate representation of source graphemes; this is the first multimodal model proposed for multilingual g2p. Our approach is highly effective: on our in-domain test set, our multimodal model reduces phoneme error rate to 2.46%, a more than 65% decrease compared to our implementation of a unimodal spelling-pronunciation model-which itself achieves state-of-the-art results on the Wiktionary test set. The advantages of the mullimodal model generalize to wholly unseen languages, reducing phoneme error rate on our out-of-domain test set to 6.39% from the unimodal 8.21%. a more than 20% relative decrease. Furthermore, our training and test sets are composed primarily of low-resource languages, demonstrating that our multimodal approach remains useful when training data are constrained.
机译:音素到音素转换(g2p)是从单词的正字表示法预测单词发音的任务。从历史上看,g2p系统是基于过渡或基于规则的,因此超出单语(高资源)域的泛化是不切实际的。最近,神经体系结构使多语言系统得以广泛推广。但是,迄今为止,所有系统都仅接受过拼音发音对的培训。我们假设用于表示发音的IPA字符序列无法充分体现其细微差别,尤其是在进行清理以利于机器学习时。我们在多任务训练过程中利用音频数据作为辅助模态,以学习源字素的最佳中间表示。这是针对多语言g2p提出的第一个多峰模型。我们的方法非常有效:在域内测试集上,我们的多模式模型将音素错误率降低到2.46%,与我们实现单状态拼写-发音模型的模式相比,降低了65%以上Wiktionary测试仪上的最新结果。多模态模型的优势可以推广到完全看不见的语言,从而将我们的域外测试集的音素错误率从单模态的8.21%降低到6.39%。相对减少幅度超过20%。此外,我们的培训和测试集主要由资源较少的语言组成,这表明当培训数据受到约束时,我们的多模式方法仍然有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号