【24h】

A Prosodic Diphone Database for Korean Text-to-Speech Synthesis System

机译:用于韩国文字转语音合成系统的韵律双音素数据库

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a prosodically conditioned diphone database to be used in a Korean text-to-speech (TTS) synthesis system. The diphones are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences (following the K-ToBI prosodic labeling conventions [3]). Pour levels of the Korean prosodic domains were observed in the diphone selection process, thereby selecting four different versions of each diphone. A 400-sentence subset of the Korean Newswire Text Corpora were converted to its pronounced form as described in [8] and its read version was prosodically labeled. The greedy algorithm identified 223 sentences containing 1,853 prosodic diphones (out of the 3,977 possible prosodic diphones) that can synthesize all four hundred utterances. Although our system cannot synthesize an unlimited number of sentences at this stage, the quality of the synthesized sentences strongly suggests that it is a viable option to use prosodically conditioned diphones in a text-to-speech synthesis system.
机译:本文介绍了一种适用于韩语文本语音转换(TTS)合成系统的有条件调节的diphone数据库。在一个单一的传统双音素被存储为不同版本的情况下,该双音素是按韵律条件存储的,直接从该音素标记的已读句子的不同韵律域中提取(遵循K-ToBI韵律标注约定[3])。在双音素选择过程中观察到韩国韵律域的倾倒水平,从而为每个双音素选择四个不同版本。如[8]中所述,韩国新闻专线语料库的一个400句子集被转换成其发音形式,并且其阅读版本被贴上标签。贪婪算法确定了223个句子,其中包含1,853个韵律双音素(在3,977个可能的韵律双音素中),它们可以合成所有四百种发音。尽管我们的系统在此阶段无法合成无限数量的句子,但是合成句子的质量强烈表明,在文本到语音合成系统中使用按条件调节的双音素是一种可行的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号