首页>
外国专利>
A Text-to-Speech Synthesis Method and System, a Method of Training a Text-to-Speech Synthesis System, and a Method of Calculating an Expressivity Score
A Text-to-Speech Synthesis Method and System, a Method of Training a Text-to-Speech Synthesis System, and a Method of Calculating an Expressivity Score
A text-to-speech synthesis method comprising: receiving text; inputting the received text in a prediction network; and generating speech data, wherein the prediction network comprises a neural network, and wherein the neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring an expressivity score for each audio sample of the audio data, wherein the expressivity score is a quantitative representation of how well an audio sample conveys emotional information and sounds natural, realistic and human-like; training the neural network using a first sub-dataset, and further training the neural network using a second sub-dataset, wherein the first sub-dataset and the second sub-dataset comprise audio samples and corresponding text from the first training dataset and wherein the average expressivity score of the audio data in the second sub-dataset is higher than the average expressivity score of the audio data in the first sub-dataset.
展开▼