Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

Michalis Papakostas; Evaggelos Spyrou; Theodoros Giannakopoulos; Giorgos Siantikos; Dimitrios Sgouropoulos; Phivos Mylonas; Fillia Makedon

首页> 外文期刊>Computation >Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

【24h】

Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

机译：多域语音情感识别的深层视觉属性与手工制作的音频功能

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms) etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN) functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.

机译：在许多与人机交互相关的应用程序中，或者在无法获得其他形式（例如视频或生理参数）的某些任务中，理解用户在某些任务中的情感状态时，语音情感识别可能会发挥关键作用。通常，可以使用多种方式来识别人的情绪，例如分析面部表情，语音，生理参数（例如脑电图，心电图）等。但是，测量这些方式可能很困难，麻烦或需要昂贵的硬件。在这种情况下，语音可能是许多实际应用中最好的替代形式。在这项工作中，我们提出一种使用卷积神经网络（CNN）作为视觉特征提取器并使用原始语音信息进行训练的方法。与传统的机器学习方法相比，CNN负责识别输入的重要特征，从而使手工完成的特征工程在许多任务中成为可选项。在本文中，除了频谱图表示形式之外，不需要其他特征，并且仅提取手工制作的特征是为了验证我们的方法。而且，它不需要任何语言模型，并且不特定于任何特定语言。我们使用跨语言数据集比较了所提出的方法，并证明了与使用手工功能的传统方法相比，它能够提供更好的结果。

著录项

来源
《Computation》 |2017年第2期|共页
作者
Michalis Papakostas; Evaggelos Spyrou; Theodoros Giannakopoulos; Giorgos Siantikos; Dimitrios Sgouropoulos; Phivos Mylonas; Fillia Makedon;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类数学;
关键词

相似文献

外文文献
中文文献
专利

1. Cross-Domain Deep Visual Feature Generation for Mandarin Audio–Visual Speech Recognition [J] . Rongfeng Su, Xunying Liu, Lan Wang, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2020,第期

机译：跨域深度视觉功能生成普通话视听语音识别
2. Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition [J] . Shiqing Zhang, Shiliang Zhang, Tiejun Huang, IEEE Transactions on Circuits and Systems for Video Technology . 2018,第10期

机译：使用混合深度模型学习情感特征以进行视听情感识别
3. Audio-visual feature fusion via deep neural networks for automatic speech recognition [J] . Mohammad Hasan Rahmani, Farshad Almasganj, Seyyed Ali Seyyedsalehi Digital Signal Processing . 2018,第期

机译：通过深度神经网络进行视听功能融合，用于自动语音识别
4. Video Emotion Recognition using Hand-Crafted and Deep Learning Features [C] . Xiaohan Xia, Jiamu Liu, Tao Yang, 2018 First Asian Conference on Affective Computing and Intelligent Interaction . 2018

机译：利用手工制作和深度学习功能进行视频情感识别
5. Deep Learning Method vs. Hand-Crafted Features for Lung Cancer Diagnosis and Breast Cancer Risk Analysis. [D] . Sun, Wenqing. 2017

机译：肺癌诊断和乳腺癌风险分析的深度学习方法与手工制作功能。
6. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [O] . Tursunov Anvarjon, Mustaqeem, Soonil Kwon 2020

机译：深网络：使用深频特征的基于轻量级CNN的语音情感识别系统
7. Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition [O] . Michalis Papakostas, Evaggelos Spyrou, Theodoros Giannakopoulos, 2017

机译：深层视觉属性与手工制作的多域语音情感识别音频特性

Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅