首页> 外文会议>IEEE World AI IoT Congress >An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

【24h】

An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

机译：一种音频处理方法，使用asd的儿童语音情感认同

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Children with Autism Spectrum Disorder (ASD) find it difficult to detect human emotions in social interactions. A speech emotion recognition system was developed in this work, which aims to help these children to better identify the emotions of their communication partner. The system was developed using machine learning and deep learning techniques. Through the use of ensemble learning, multiple machine learning algorithms were joined to provide a final prediction on the recorded input utterances. The ensemble of models includes a Support Vector Machine (SVM), a Multi-Layer Perceptron (MLP), and a Recurrent Neural Network (RNN). All three models were trained on the Ryerson Audio-Visual Database of Emotional Speech and Songs (RAVDESS), the Toronto Emotional Speech Set (TESS), and the Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D). A fourth dataset was used, which was created by adding background noise to the clean speech files from the datasets previously mentioned. The paper describes the audio processing of the samples, the techniques used to include the background noise, and the feature extraction coefficients considered for the development and training of the models. This study presents the performance evaluation of the individual models to each of the datasets, inclusion of the background noises, and the combination of using all of the samples in all three datasets. The evaluation was made to select optimal hyperparameters configuration of the models to evaluate the performance of the ensemble learning approach through majority voting. The overall performance of the ensemble learning reached a peak accuracy of 66.5%, reaching a higher performance emotion classification accuracy than the MLP model which reached 65.7%.

机译：自闭症谱系障碍（ASD）的儿童发现难以检测社会互动中的人类情绪。在这项工作中开发了一种语音情感识别系统，旨在帮助这些孩子更好地确定他们的沟通伴侣的情绪。该系统是使用机器学习和深度学习技术开发的。通过使用集合学习，加入多机学习算法以在记录的输入话语上提供最终预测。模型的集合包括支持向量机（SVM），多层Perceptron（MLP）和经常性神经网络（RNN）。所有三种型号都培训了情绪语音和歌曲（Ravdess）的Ryerson Audio-Visual Data数据库，Toronto情感演讲组（TESS）和人群源性情绪多模式演员数据集（CREMA-D）。使用第四个数据集，该数据集是通过向前提到的数据集添加到清洁语音文件的背景噪声来创建的。本文描述了样本的音频处理，用于包括背景噪声的技术，以及考虑模型的开发和训练的特征提取系数。本研究提出了对每个数据集的各个模型的性能评估，包括背景噪声，以及在所有三个数据集中使用所有样本的组合。评估是为了选择模型的最佳超参数配置，以评估通过多数投票的集合学习方法的性能。集合学习的整体性能达到了66.5％的峰值精度，达到了比MLP模型更高的性能情绪分类精度，达到65.7％。

著录项

来源
《IEEE World AI IoT Congress》|2021年|0055-0061|共7页
会议地点
作者
Damian Valles; Rezwan Matin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Support vector machines; Training; Performance evaluation; Emotion recognition; Recurrent neural networks; Machine learning algorithms; Speech recognition;

机译：支持向量机;培训;绩效评估;情绪识别;经常性神经网络;机器学习算法;语音识别;

相似文献

外文文献
中文文献
专利

1. 一种新的结合情感数据场和蚁群策略的语音情感识别算法 [J] . 查诚, 陶华伟, 张昕然, 东南大学学报（英文版） . 2016,第002期
2. Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features [J] . Hao Man, Cao Wei-Hua, Liu Zhen-Tao, Neurocomputing . 2020,第May28期

机译：基于多任务和集合学习的视觉音频情感识别
3. An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition [J] . Semwal Vijay Bhaskar, Gupta Anjali, Lalwani Praveen Journal of supercomputing . 2021,第11期

机译：一种优化的混合深层学习模型，使用综合学习方法识别
4. Children with ASD can use gaze in support of word recognition and learning [J] . McGregorK.K., RostG., ArenasR., Journal of child psychology and psychiatry . 2013,第7期

机译：患有自闭症的儿童可以使用凝视来支持单词识别和学习
5. Audiovisual Design of Learning Systems for Children with ASD [C] . Rafael Toscano, Valdecir Becker International conference on human-computer interaction;International conference on universal access in human-computer interaction . 2018

机译：自闭症儿童学习系统的视听设计
6. Multimodal Sensing and Data Processing for Speaker and Emotion Recognition Using Deep Learning Models with Audio, Video and Biomedical Sensors [D] . Abtahi, Farnaz. 2018

机译：使用具有音频，视频和生物医学传感器的深度学习模型，对说话人和情感识别进行多模式传感和数据处理
7. Physical Activity Recognition Based on a Parallel Approach for an Ensemble of Machine Learning and Deep Learning Classifiers [O] . Mariem Abid, Amal Khabou, Youssef Ouakrim, 2021

机译：基于机器学习和深度学习分类的集合的平行方法的身体活动识别
8. Children with ASD can use gaze in support of word recognition and learning [O] . Karla K. McGregor, Gwyneth Rost, Rick Arenas, 2013

机译：有亚当的儿童可以使用凝视来支持单词识别和学习

An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

摘要

著录项

相似文献

相关主题

期刊订阅