首页>
外国专利>
Speech synthesis by combining probability distributions from different linguistic levels
Speech synthesis by combining probability distributions from different linguistic levels
展开▼
机译:通过组合不同语言水平的概率分布进行语音合成
展开▼
页面导航
摘要
著录项
相似文献
摘要
In a text-to-speech synthesiser which converts text input (15, fig. 1) to audio output (17), a multi-level probability model relating all potential speech vectors to a specific utterance is generated at a range of linguistic levels. Text is first converted into linguistic units of different level (eg. syllables on one level, words on another) each having a duration of several frames, and linguistic context (eg. phonetic, prosodic, semantic or syntactic information) associated with each unit. Each unit is then related to linear parameters of a speech signal contour according to probability distributions (fig. 3) in a model of speech vectors (eg. fundamental frequency F0, lsp, aperiodicity, S405 fig. 4), whose mean and variance are determined during the training of the system (figs. 6-9). The probability distributions of all the different levels are finally combined a la Bayes and Expectation Maximization algorithms to give a total Gaussian distribution for speech vector x having a mean x and a variance P.
展开▼