In this paper minimal representation of voiced speech based on decomposition into AM-FM components is proposed for generation of emotion speech. For the decomposition, firstly time-frequency boundaries of AM-FM components are estimated and secondary each AM-FM component is extracted by using the variable bandwidth filter [17] adaptive to the estimated time-frequency boundaries. Finally, two parameters, that is, instantaneous frequency and instantaneous amplitude of each AM-FM component are estimated. The set composed of instantaneous amplitudes and instantaneous frequencies is the minimal representation of voiced speech signals. The minimal representation is optimal feature set since the set describes effectively the biomechanical characteristics of the vocal codes and the vocal track. Raw speech signals are modified by changing the parameters for generation of emotion speech.
展开▼