To construct a natural singing-voice synthesis system, it is important to adequately control acoustic features such as fundamental frequency (FO), spectrum shapes, and phoneme duration in the synthesis method. This paper reveals acoustic features affecting singing-voice perception by comparative analyzing singing- and speaking-voices, and then proposes a transforming method from speaking-voice into singing-voice using STRAIGHT. This method is composed of an FO control model for generating FO contours of singing-voices, a spectral sequence control model for modifying spectral shapes in speaking-voice, and a duration control model based on rhythm. Results showed that the proposed system could synthesize a natural singing-voice, whose sound quality is almost the same as that of real one.
展开▼