首页> 外文会议>International Conference on Asian Language Processing >Indonesian Corpus Constructing and Text Processing for Speech Synthesis
【24h】

Indonesian Corpus Constructing and Text Processing for Speech Synthesis

机译:印度尼西亚语料库构建和语音合成的文本处理

获取原文

摘要

This paper focused on the development of Indonesian speech synthesis system, and it studied Indonesian text analysis and processing methods. It mainly studied Indonesian pronunciation corpus selection, text normalization and syllable division methods. Using the principle of combination of high frequency words and sentence length, we selected 5000 sentences as pronunciation corpus from a 566MB original text corpus. By using a combination of regular expressions and keywords, the numbers in the text are normalized. Furthermore, a combination of syllable lists and special rules are used to achieve syllable segmentation. The experimental results show that the above proposed methods laid a good foundation for the development of the Indonesian speech synthesis system.
机译:本文侧重于印度尼西亚语音合成系统的发展,研究了印尼文本分析和处理方法。它主要研究印度尼西亚语发音词选择,文本归一化和音节分部方法。利用高频词和句子长度的组合原理,我们选择了5000个句子作为来自566MB原始文本语料库的发音词。通过使用正则表达式和关键字的组合,文本中的数字归一化。此外,使用音节列表和特殊规则的组合来实现音节分段。实验结果表明,上述方法为印度尼西亚语音合成系统的发展奠定了良好的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号