Indonesian Corpus Constructing and Text Processing for Speech Synthesis

机译：印度尼西亚语料库构建和语音合成的文本处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper focused on the development of Indonesian speech synthesis system, and it studied Indonesian text analysis and processing methods. It mainly studied Indonesian pronunciation corpus selection, text normalization and syllable division methods. Using the principle of combination of high frequency words and sentence length, we selected 5000 sentences as pronunciation corpus from a 566MB original text corpus. By using a combination of regular expressions and keywords, the numbers in the text are normalized. Furthermore, a combination of syllable lists and special rules are used to achieve syllable segmentation. The experimental results show that the above proposed methods laid a good foundation for the development of the Indonesian speech synthesis system.

机译：本文侧重于印度尼西亚语音合成系统的发展，研究了印尼文本分析和处理方法。它主要研究印度尼西亚语发音词选择，文本归一化和音节分部方法。利用高频词和句子长度的组合原理，我们选择了5000个句子作为来自566MB原始文本语料库的发音词。通过使用正则表达式和关键字的组合，文本中的数字归一化。此外，使用音节列表和特殊规则的组合来实现音节分段。实验结果表明，上述方法为印度尼西亚语音合成系统的发展奠定了良好的基础。

著录项

来源
《International Conference on Asian Language Processing》|2018年|383p|共4页
会议地点
作者
Xuan Kong; Jian Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP312-53;
关键词
Speech synthesis; Standards; Dictionaries; Text analysis; Linguistics; Databases; High frequency;

机译：语音合成;标准;词典;文本分析;语言学;数据库;高频;

相似文献

外文文献
中文文献
专利

1. Constructing a speech audio-video corpus by aligning long segments of speech and text [J] . Karpukhin I. A., Konushin Anton S. Moscow University Computational Mathematics and Cybernetics . 2017,第2期

机译：通过对齐语音和文本的长段来构建语音视听语料库
2. Emilia: a speech corpus for Argentine Spanish text to speech synthesis [J] . Torres Humberto M., Gurlekian Jorge A., Evin Diego A., Language Resources and Evaluation . 2019,第3期

机译：艾米利亚：阿根廷语文本到语音合成的语音语料库
3. Emilia: a speech corpus for Argentine Spanish text to speech synthesis [J] . Torres Humberto M., Gurlekian Jorge A., Evin Diego A., Language Resources and Evaluation . 2019,第3期

机译：艾米利亚：阿根廷西班牙语文本给语音合成的语音语料库
4. Indonesian Corpus Constructing and Text Processing for Speech Synthesis [C] . Xuan Kong, Jian Yang International conference on Asian language processing . 2018

机译：印尼语语料库的构建和语音合成的文本处理
5. Automatic text and speech processing for the detection of dementia. [D] . Fraser, Kathleen. 2016

机译：自动文本和语音处理，可检测痴呆症。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. TEXT PRE-PROCESSING PADA TEXT TO SPEECHudSYNTHESIS SYSTEM UNTUK PENUTURudBERBAHASA INDONESIA [O] . Handi Dwi Rachma Bayu Handi 2011

机译：文本预处理文本到语音扬声器合成系统印度尼西亚语

Indonesian Corpus Constructing and Text Processing for Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅