首页> 外文会议>International Conference on Information Technology Systems and Innovation >Study and Implementation of Prosody Manipulation Method For Indonesian Speech Synthesis System
【24h】

Study and Implementation of Prosody Manipulation Method For Indonesian Speech Synthesis System

机译:印尼语音合成系统韵律操纵方法的研究与实现

获取原文

摘要

Speech Synthesis System is a system used to convert text in a language into a sound. The focus of this research is to produce a “humanization” of speech synthesis system pronunciation. The main requirement for Text To Speech system in this research are eSpeak, MBROLA idl database for Indonesia, Human Speech Corpus database which derived from the website that summarizes the words with the most common words used in a country, and three basic types of emotion or intonation that are designed for happy emotions, angry emotions, and sad emotions. The approach method used to develop an emotional filter is to manipulate prosody values (especially pitch and duration values) using predetermined level factors. The test results of Human Speech Corpus perception test for happy emotions are 95%, angry emotion is 96.25% and emotion sad is 98.75%. For the aspect of the clarity test, the audible sound accuracy with the original sentence is 93.3%, and for the clarity level each sentence is 62.8%. For the naturalness aspect to test the accuracy of emotional selection is 75.6% with every happy emotion is 90%, angry emotion is 73.3% and sad emotions of 60%.
机译:语音合成系统是用于将语言文字转换为声音的系统。这项研究的重点是产生语音合成系统发音的“人性化”。这项研究对文字转语音系统的主要要求是eSpeak,印度尼西亚的MBROLA idl数据库,人类语音语料库数据库,该数据库来自网站,该网站总结了一个国家中最常用的单词,以及三种基本的情感或专为快乐的情绪,愤怒的情绪和悲伤的情绪而设计的语调。用于开发情绪过滤器的方法是使用预定级别因子来操纵韵律值(尤其是音调和持续时间值)。人类语音语料库感知测试的幸福感测试结果为95%,愤怒情感为96.25%,悲伤情感为98.75%。对于清晰度测试,原始句子的可听声音准确性为93.3%,对于清晰度级别,每个句子为62.8%。对于自然性方面而言,选择情感的准确性为75.6%,其中每个快乐情绪为90%,愤怒情绪为73.3%,悲伤情绪为60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号