首页> 外国专利> Speech synthesis by combining probability distributions from different linguistic levels

Speech synthesis by combining probability distributions from different linguistic levels

机译:通过组合不同语言水平的概率分布进行语音合成

摘要

In a text-to-speech synthesiser which converts text input (15, fig. 1) to audio output (17), a multi-level probability model relating all potential speech vectors to a specific utterance is generated at a range of linguistic levels. Text is first converted into linguistic units of different level (eg. syllables on one level, words on another) each having a duration of several frames, and linguistic context (eg. phonetic, prosodic, semantic or syntactic information) associated with each unit. Each unit is then related to linear parameters of a speech signal contour according to probability distributions (fig. 3) in a model of speech vectors (eg. fundamental frequency F0, lsp, aperiodicity, S405 fig. 4), whose mean and variance are determined during the training of the system (figs. 6-9). The probability distributions of all the different levels are finally combined a la Bayes and Expectation Maximization algorithms to give a total Gaussian distribution for speech vector x having a mean x and a variance P.
机译:在将文本输入(15,图1)转换为音频输出(17)的文本到语音合成器中,在语言水平范围内生成了将所有潜在语音矢量与特定话语相关联的多级概率模型。首先将文本转换为不同级别的语言单元(例如,一个级别上的音节,另一个级别上的单词),每个语言单元具有几帧的持续时间,以及与每个单元相关联的语言上下文(例如,语音,韵律,语义或句法信息)。然后,根据语音矢量模型(例如,基本频率F0,lsp,非周期性,图4的S405)中的概率分布(图3),将每个单元与语音信号轮廓的线性参数相关联。在系统培训期间确定(图6-9)。最后,将所有不同级别的概率分布按贝叶斯和期望最大化算法进行组合,以给出具有平均值x和方差P的语音矢量x的总高斯分布。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号