We are now developing an emotional-voice synthesizer based on TD-PSOLA. In this paper, we first give examples of the merit of containing non-linguistic information in synthesized voice. Second, we explain the reason of adopting TD-PSOLA using VCV-wave segments, and suggest the Prosodic-Balanced Database to compensate a fault of the algorithm. Third, we analyze emotional utterances to make balanced database and deriving fomulas which predict phone length. Fourth, we build an emotional-voice synthesizer and synthesize some samples emotional voice. We carried out a hearing test using these samples, and obtained the result that the rate of emotion recognition was 84.1 and intelligibility of synthesized speech was 97.9.
展开▼