To enhance the recognition rate of speaker independent speech emotion recognition, a feature selection and feature fusion combination method based on multiple kernel learning is presented. Firstly, multiple kernel learning is used to obtain sparse feature subsets. The features selected at least n times are recombined into another subset named n-subset. The optimal n is determined by 10 cross-validation experiments. Secondly, feature fusion is made at the kernel level. Not only each kind of feature is associated with a kernel, but also the full feature set is associated with a kernel which is not considered in the previous studies. All of the kernels are added together to obtain a combination kernel. The final recognition rate for 7 kinds of emotions on Berlin Database is 83.10%, which outperforms state-of-the-art results and shows the effectiveness of our method. It is also proved that MFCCs play a crucial role in speech emotion recognition.
展开▼