【24h】

Using closely-related language to build an ASR for a very under-resourced language: Iban

机译:使用密切相关的语言为资源非常匮乏的语言构建ASR:Iban

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, namely the Iban language, which is spoken in Sarawak, a Malaysian Borneo state. To begin this study, we collected 8 hours of speech data due to no resources yet for ASR concerning this language. Following the lack of resources, we employed bootstrapping techniques on a closely-related language to build the Iban system. For this case, we utilized Malay data to bootstrap the grapheme-to-phoneme system (G2P) for the target language. We also developed several G2Ps to acquire Iban pronunciation dictionaries, which were later evaluated on the Iban ASR for obtaining the best version. Subsequently, we conducted experiments on cross-lingual ASR by using subspace Gaussian Mixture Models (SGMM) where the shared parameters obtained in either monolingual or multilingual fashion. From our observations, using out-of-language data as source language provided lower WER when Iban data is very imited.
机译:本文介绍了我们针对资源贫乏的语言(即伊班语)的自动语音识别系统(ASR)的工作,该语言在马来西亚婆罗洲的沙捞越州使用。为了开始这项研究,由于没有足够的关于ASR的信息,我们收集了8个小时的语音数据。由于缺乏资源,我们在一种密切相关的语言上采用了引导技术来构建Iban系统。对于这种情况,我们利用马来语数据来引导目标语言的音素到音素系统(G2P)。我们还开发了一些G2P,以获取Iban语音词典,然后在Iban ASR上对其进行评估以获取最佳版本。随后,我们使用子空间高斯混合模型(SGMM)对跨语言ASR进行了实验,其中共享参数以单语言或多语言方式获得。根据我们的观察,当非常模仿Iban数据时,使用语言外数据作为源语言可提供较低的WER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号