首页> 外文期刊>The Visual Computer >Human emotion recognition from videos using spatio-temporal and audio features
【24h】

Human emotion recognition from videos using spatio-temporal and audio features

机译:使用时空和音频功能从视频中识别人类情感

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97% of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15% accuracy, while using audio features only gives recognition average accuracy of 67.39%. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27%.
机译:在本文中,我们提出了基于音频和时空视觉特征的人类情感识别系统。所提出的系统已经在视听情感数据集上针对不同性别的男女进行了测试。首先识别梅尔频率倒谱系数(MFCC)和韵律特征,然后从情感语音中提取。对于面部表情,从视觉流中提取时空特征。主成分分析(PCA)用于减少视觉特征的维数并捕获97%的差异。使用欧几里得空间为视听功能构建了Codebook。然后,将直方图的出现用作最新的SVM分类器的输入,以实现对每个分类器的判断。此外,使用贝叶斯和规则(BSR)作为最终决策步骤组合来自每个分类器的判断。所提议的系统在公共数据集上进行了测试,以识别人类的情绪。实验结果和仿真结果表明,使用视觉特征只能产生74.15%的平均准确度,而使用音频特征只能产生67.39%的识别平均准确度。通过结合音频和视频功能,整个系统的精度已显着提高,高达80.27%。

著录项

  • 来源
    《The Visual Computer》 |2013年第12期|1269-1275|共7页
  • 作者单位

    College of Engineering (COE). Karachi Institute of Economics and Technology (KIET), 75190. Korangi Creek. Karachi. Pakistan;

    Computer Vision, Video and Image Processing Lab (CVVIP). Faculty of Electrical Engineering, Universiti Teknologi Malaysia. UTM 81310. Skudai. Johor Bahru. Malaysia;

    Computer Vision, Video and Image Processing Lab (CVVIP). Faculty of Electrical Engineering, Universiti Teknologi Malaysia. UTM 81310. Skudai. Johor Bahru. Malaysia;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Human computer interface (HCI); Multimodal system; Human emotions; Support vector machines (SVM); Spatio-temporal features;

    机译:人机界面(HCI);多式联运系统;人类的情感;支持向量机(SVM);时空特征;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号