Abstract Background Utterance copy consists in estimating the input parameters to reconstruct a speech signal using a speech synthesizer. This process is distinct from the more traditional text-to-speech but yet used in many areas, especially in linguistics and health. Utterance copy is a difficult inverse problem because the mapping is non-linear and from many to one. It requires considerable amount of time to manually perform utterance copy and automatic methods, such as the one proposed here, are of interest. Methods This work presents our system based on genetic algorithm (GA) to automatically estimate the input parameters of the Klatt synthesizer using an analysis-by-synthesis process. Results Results are presented for synthetic (computer-generated) and natural (human-generated) speech, for male and female speakers. These results are compared with the ones obtained with WinSnoori, the only currently available software that performs the same task. Conclusions The experiments showed that the proposed newGASpeech system is an effective alternative to the laborious manual process of estimating the input parameters of a Klatt synthesizer. And it outperforms the baseline by a large margin with respect to five objective figures of merit. For example, in average, the mean squared error is reduced to approximately 60.4 % and 75.2 % when natural target voices from male and female speakers are used, respectively.
展开▼