Provided are a method and apparatus, medium, and device for speech synthesis based on a prosodic boundary, said method comprising: obtaining prosodic boundary information of text information to be synthesized, and generating image embedded information on the basis of the prosodic boundary information (S102); generating a hidden state vector of the image embedded information and the sequence coding of the text information to be synthesized (S104); generating a speech spectrum on the basis of the hidden state vector and sequence coding (S106); according to the speech spectrum, synthesizing the speech information of the text information to be synthesized (S108). The semantic and grammatical structure of a sentence can be analyzed from the text side, and the prosodic boundary is represented by means of image embedding, such that the prosodic information in the text can be fully involved in training and reasoning, improving the sense of prosody of the synthesized speech information. The invention also relates to blockchain technology; the hidden state vector and the sequence coding of the text information to be synthesized are stored in the blockchain, thus improving the security of data storage.
展开▼