首页> 外文会议>International conference on Asian language processing >Indonesian Corpus Constructing and Text Processing for Speech Synthesis
【24h】

Indonesian Corpus Constructing and Text Processing for Speech Synthesis

机译:印尼语语料库的构建和语音合成的文本处理

获取原文

摘要

This paper focused on the development of Indonesian speech synthesis system, and it studied Indonesian text analysis and processing methods. It mainly studied Indonesian pronunciation corpus selection, text normalization and syllable division methods. Using the principle of combination of high frequency words and sentence length, we selected 5000 sentences as pronunciation corpus from a 566MB original text corpus. By using a combination of regular expressions and keywords, the numbers in the text are normalized. Furthermore, a combination of syllable lists and special rules are used to achieve syllable segmentation. The experimental results show that the above proposed methods laid a good foundation for the development of the Indonesian speech synthesis system.
机译:本文着眼于印尼语音合成系统的发展,研究了印尼文字分析和处理方法。它主要研究印尼语发音语料库的选择,文本规范化和音节划分方法。结合高频词和句子长度的原则,我们从566MB的原始语料库中选择了5000个句子作为发音语料库。通过使用正则表达式和关键字的组合,可以将文本中的数字标准化。此外,音节列表和特殊规则的组合用于实现音节分割。实验结果表明,上述方法为印尼语音合成系统的发展奠定了良好的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号