Speech synthesis methods and apparatus are disclosed. The disclosed speech synthesis method uses a sub-encoder to determine a first feature vector representing a speaker's speech characteristic from feature vectors of a plurality of frames extracted in a partial section of the speaker's first speech signal, and the first feature vector The second feature vector of the second speech signal in which the text is uttered is determined from the context information of the text using the autoregressive decoder input as the initial value.
展开▼