The objective of this work is to explore the importance of parameters contributing to synthesis of expression in vocal communication. The algorithm discussed in this paper uses a combination of Dynamic Time Warping (DTW) and prosody manipulation to inter-convert emotions among one another and compares with neutral to emotion conversion using objective and subjective performance indices. Existing explicit control methods are based on prosody modification using neutral speech as starting point and have not explored the possibility of conversion between inter-related emotions. Also, most of the previous work relies entirely on perception tests for evaluation of speech quality post synthesis. In this paper, the objective comparison in terms of error percentage is verified with forced choice perception test results. Both indicate the effectiveness of inter-emotion conversion by speech with better quality. The same is also depicted by synthesis results and spectrograms.
展开▼