Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

机译：用于低资源语言的多模式，多语言音素到音素转换

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Grapheme-to-phoneme conversion (g2p) is the task of predicting the pronunciation of words from their orthographic representation. Historically, g2p systems were transition- or rule-based, making generalization beyond a monolingual (high resource) domain impractical. Recently, neural architectures have enabled multilingual systems to generalize widely; however, all systems to date have been trained only on spelling-pronunciation pairs. We hypothesize that the sequences of IPA characters used to represent pronunciation do not capture its full nuance, especially when cleaned to facilitate machine learning. We leverage audio data as an auxiliary modality in a multi-task training process to learn a more optimal intermediate representation of source graphemes; this is the first multimodal model proposed for multilingual g2p. Our approach is highly effective: on our in-domain test set, our multimodal model reduces phoneme error rate to 2.46%, a more than 65% decrease compared to our implementation of a unimodal spelling-pronunciation model-which itself achieves state-of-the-art results on the Wiktionary test set. The advantages of the mullimodal model generalize to wholly unseen languages, reducing phoneme error rate on our out-of-domain test set to 6.39% from the unimodal 8.21%. a more than 20% relative decrease. Furthermore, our training and test sets are composed primarily of low-resource languages, demonstrating that our multimodal approach remains useful when training data are constrained.

机译：音素到音素转换（g2p）是从单词的正字表示法预测单词发音的任务。从历史上看，g2p系统是基于过渡或基于规则的，因此超出单语（高资源）域的泛化是不切实际的。最近，神经体系结构使多语言系统得以广泛推广。但是，迄今为止，所有系统都仅接受过拼音发音对的培训。我们假设用于表示发音的IPA字符序列无法充分体现其细微差别，尤其是在进行清理以利于机器学习时。我们在多任务训练过程中利用音频数据作为辅助模态，以学习源字素的最佳中间表示。这是针对多语言g2p提出的第一个多峰模型。我们的方法非常有效：在域内测试集上，我们的多模式模型将音素错误率降低到2.46％，与我们实现单状态拼写-发音模型的模式相比，降低了65％以上Wiktionary测试仪上的最新结果。多模态模型的优势可以推广到完全看不见的语言，从而将我们的域外测试集的音素错误率从单模态的8.21％降低到6.39％。相对减少幅度超过20％。此外，我们的培训和测试集主要由资源较少的语言组成，这表明当培训数据受到约束时，我们的多模式方法仍然有用。

著录项

来源
《Workshop on deep learing approaches for low-resource natural language processing》|2019年|192-201|共10页
会议地点
作者
James Route; Steven Hillis; Isak C. Etinger; Han Zhang; Alan Black;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification [J] . Basu Joyanta, Khan Soma, Roy Rajib, Circuits, systems and signal processing . 2021,第10期

机译：用于扬声器和语言识别的低资源东部和东北印度语言语言的多语种演讲语料库
2. A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages [J] . Zennaki O., Semmar N., Besacier L. Natural language engineering . 2019,第PTa1期

机译：用于诱导多语言资源的神经方法和用于低资源语言的自然语言处理工具
3. Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning [J] . Murthy Rudra, Khapra Mitesh M., Bhattacharyya Pushpak ACM transactions on Asian language information processing . 2019,第2期

机译：通过多语言学习提高低资源语言中的NER标签性能
4. Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages [C] . James Route, Steven Hillis, Isak C. Etinger, Workshop on deep learing approaches for low-resource natural language processing . 2019

机译：用于低资源语言的多模式，多语言石墨对 - 音素转换
5. Grapheme-to-phoneme conversion and its application to transliteration. [D] . Jiampojamarn, Sittichai. 2011

机译：音素到音素的转换及其在音译中的应用。
6. Toward an Executive Origin for Acquired Phonological Dyslexia: A Case of Specific Deficit of Context-Sensitive Grapheme-to-Phoneme Conversion Rules [O] . Noémie Auclair-Ouellet, Marion Fossard, Marie-Catherine St-Pierre, 2013

机译：迈向获得性语音阅读障碍的行政起源：上下文敏感的音素到音素转换规则的特定缺陷案例
7. Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages [O] . James Route, Steven Hillis, Isak Czeresnia Etinger, 2019

机译：用于低资源语言的多模式，多语言石墨对 - 音素转换

Multimodal, Multilingual Grapheme-to-Phoneme Conversion for Low-Resource Languages

摘要

著录项

相似文献

相关主题

期刊订阅