Statistical vocoder based speech synthesis system has a small footprint and a flexibility to change voice characteristics. However, the output synthesized speech sounds mechanic or a little bit buzzy comparing with natural human speech. Mixed excitation model instead of either a periodic impulse train or white noise is commonly used for low bit rate speech coding. In this paper, we extend it to statistical vocoder based speech synthesis. We also compare two methods: comb filter and normalized correlation coefficient, of extracting periodicity ratios for mixed excitation model. Excitation parameters are modeled by HMM in a slave manner, where the state boundaries are given by spectral and pitch models. Two corpora uttered by a male and a female speaker are used to evaluate mixed excitation model. The experimental results show the voice quality of synthesized speech with mixed excitation model can be significantly improved and the method of Comb filter for extracting periodicity ratios slightly outperform normalized correlation coefficient.
展开▼