The existing voice conversion (VC) systems, those based on Gaussian mixture models(GMM), bring the problems of over smoothing of GMM mapping. With an aim towards resolving these problems, this paper provides a method on Acoustical Universal Structure (ASU) that can be applied to voice conversion based on GMM. Our contributions include:1) speech transformation and representation using adaptive interpolation of weighted-spectrum (STRAIGHT) model is taken which allows flexible manipulation of speech parameters such as pitch, vocal tract length, and speaking rate while maintaining high reproduction quality;2) The advantage of the paper is attributed to the introduction of the predictable spectrum, the ASU, in this paper, is introduced to form the mapping relationship between the source speaker and target speaker.3) In the training phase, the feedback strategy is adopted, which guarantee the smooth translation of spectral parameters between frames. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of speech quality, conversion accuracy and naturalness for speaker individuality from the objective and subjective tests.
展开▼