Speech is a rich source of information. The speech samples can not only retain what is being spoken but also the emotional state of the speaker. In this paper, the dynamics of the prosodic features and the spectral features have been used to encode the mood content of speakers of Assamese language with dialectal components. A composite feature set has been created by fusing the spectral and the prosodic features. The performance of the system has been evaluated using two classifiers namely Recurrent Neural Network (RNN) and Feed Forward Time Delay Neural Network (FFTDNN). A comparative analysis has been made on their computational speed and recognition rates. The performance of the proposed mood verification system has also been evaluated by varying the background noise conditions.
展开▼