This paper discusses two issues of the quality improvement of F0 modified speech based upon PSOLA analysis-synthesis. Previous studies [1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to undesired acoustic evnets. In this paper, several methods are experimetnally examined to reduce pitch pulse detection errors. Even when the detection is done correctly, F0 modified re-synthesized speech sometimes causes "echoes" in the re-arranged waveforms. This is mainly caused by a pitch pulse with small sharpness or by that with two relatively high pulses, not pitch pulses, before and after it. To suppress the echoes with little loss of naturalness, partial zero/#pi#-phase conversion is proposed here. Experimetns show the high validity of the proposed methods in improving the quality of re-synthesized speech.
展开▼