The computer system 1 includes a speaker information estimation unit 130 that estimates speaker information of an unknown speaker based on the acoustic feature amount of the unknown speaker without requiring input of text as teacher data. The speaker information of the unknown speaker includes a speaker code that represents the degree of similarity between the distribution of acoustic feature amounts of the unknown speaker and the distribution of acoustic feature amounts of a plurality of known speakers. The computer system 1 uses a multi-speaker acoustic model (DNN) 230 to synthesize acoustic features of an unknown speaker based on the language feature amount of the input text and the speaker information of the unknown speaker. A synthetic acoustic feature generation unit 220 that generates a quantity and a synthetic speech generation unit 240 that generates a synthesized voice of the unknown speaker based on the synthesized acoustic feature quantity of the unknown speaker.
展开▼