The pitch-synchronous overlap-add (PSOLA) speech synthesis method has been conventionally used for a high-quality waveform-concatenation. The basis lies in the periodic structure of voiced speech, i.e., the pitchmark. Though the PSOLA-synthesized sound has a high quality so far as the pitchmark detection is successful, it is sometimes degraded to a great extent when it fails to detect the pitchmark or, more fundamentally, when the sound is unvoiced consonant. In this paper, we propose a pitch-asynchronous waveform-concatenation speech synthesis method. It is based on an adaptive phase optimization by using a complex-valued neural processing to maintain a desirable degree of pulse sharpness. Experimental results demonstrate a successful generation of high-quality sound.
展开▼