首页> 外文会议>IEEE World AI IoT Congress >An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD
【24h】

An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

机译:一种音频处理方法,使用asd的儿童语音情感认同

获取原文

摘要

Children with Autism Spectrum Disorder (ASD) find it difficult to detect human emotions in social interactions. A speech emotion recognition system was developed in this work, which aims to help these children to better identify the emotions of their communication partner. The system was developed using machine learning and deep learning techniques. Through the use of ensemble learning, multiple machine learning algorithms were joined to provide a final prediction on the recorded input utterances. The ensemble of models includes a Support Vector Machine (SVM), a Multi-Layer Perceptron (MLP), and a Recurrent Neural Network (RNN). All three models were trained on the Ryerson Audio-Visual Database of Emotional Speech and Songs (RAVDESS), the Toronto Emotional Speech Set (TESS), and the Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D). A fourth dataset was used, which was created by adding background noise to the clean speech files from the datasets previously mentioned. The paper describes the audio processing of the samples, the techniques used to include the background noise, and the feature extraction coefficients considered for the development and training of the models. This study presents the performance evaluation of the individual models to each of the datasets, inclusion of the background noises, and the combination of using all of the samples in all three datasets. The evaluation was made to select optimal hyperparameters configuration of the models to evaluate the performance of the ensemble learning approach through majority voting. The overall performance of the ensemble learning reached a peak accuracy of 66.5%, reaching a higher performance emotion classification accuracy than the MLP model which reached 65.7%.
机译:自闭症谱系障碍(ASD)的儿童发现难以检测社会互动中的人类情绪。在这项工作中开发了一种语音情感识别系统,旨在帮助这些孩子更好地确定他们的沟通伴侣的情绪。该系统是使用机器学习和深度学习技术开发的。通过使用集合学习,加入多机学习算法以在记录的输入话语上提供最终预测。模型的集合包括支持向量机(SVM),多层Perceptron(MLP)和经常性神经网络(RNN)。所有三种型号都培训了情绪语音和歌曲(Ravdess)的Ryerson Audio-Visual Data数据库,Toronto情感演讲组(TESS)和人群源性情绪多模式演员数据集(CREMA-D)。使用第四个数据集,该数据集是通过向前提到的数据集添加到清洁语音文件的背景噪声来创建的。本文描述了样本的音频处理,用于包括背景噪声的技术,以及考虑模型的开发和训练的特征提取系数。本研究提出了对每个数据集的各个模型的性能评估,包括背景噪声,以及在所有三个数据集中使用所有样本的组合。评估是为了选择模型的最佳超参数配置,以评估通过多数投票的集合学习方法的性能。集合学习的整体性能达到了66.5%的峰值精度,达到了比MLP模型更高的性能情绪分类精度,达到65.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号