首页> 外文会议>Fourth Workshop on South and Southeast Asian Natural language processing >Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
【24h】

Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language

机译:资源不足语言的音素快速引导到音素系统-Iban语言的应用

获取原文
获取原文并翻译 | 示例

摘要

This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme-sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the "pronunciation distance" between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words.
机译:本文讨论了语音自译为音素(G2P)转换系统的快速启动,该系统是自动语音识别(ASR)和文本至语音合成(TTS)的关键模块。这个想法是利用一种本地主导语言(马来语)和一种资源匮乏的语言(伊班语,在沙捞越和婆罗洲的一些地区使用的语言)之间的语言联系,而对于这种语言而言,实际上没有任何资源或知识可用。更确切地说,使用预先存在的马来语G2P来产生伊班语单词的音素序列。然后由Iban本地人手动编辑(更正)音素。这种以半监督方式生产的资源,后来被用于训练第一个使用Iban语言的G2P系统。作为这种方法的副产品,对马来语和伊班语之间“发音距离”的分析启发了这两种语言之间的语音和正字关系。进行的实验表明,仅对应用于Iban单词的马来语G2P输出进行两个小时的修订(校正),即可获得一个相当有效的Iban G2P系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号