首页> 外文会议>International Conference on Advanced Computer Science and Information Systems >Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Javanese, and Sundanese Languages
【24h】

Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Javanese, and Sundanese Languages

机译:印尼语,爪哇语和Sun达语的文本到语音的分层转移学习

获取原文

摘要

This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.
机译:这项研究开发了印尼语,爪哇语和Sun语的端到端基于深度学习的文本到语音(TTS)。尽管端对端神经TTS(例如Tacotron-2)最近取得了显着进步,但它仍然遭受Javanese和Sundanese等低资源语言的数据短缺问题的困扰。我们的初步研究表明,基于Tacotron-2的TTS需要大量的训练数据。该模型至少需要10个小时的训练数据,才能合成可接受的质量和可理解的语音。为了解决这一资源匮乏的问题,我们的工作提出了一种分层转移学习,以利用英语域的高资源语言和印度尼西亚域的相似中资源语言,来训练爪哇语和Sun语的TTS。我们报告说,使用平均意见评分(MOS)进行的综合语音评估对于印尼人而言达到4.27,对于爪哇人而言达到4.08,对于Sundanese人而言达到3.92。印尼语,语义上无法预测的句子(SUS)的单词准确度(WAcc)评估达到98.26%,爪哇语达到95.02%,Sun达语达到95.43%。综合语音质量的主观评价表明,我们的迁移学习方案已成功应用于低资源目标域的TTS模型。使用不到一小时的训练数据,印度尼西亚语为38分钟,爪哇语为16分钟,孙丹语为19分钟,TTS模型可以快速学习并获得足够的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号