...
首页> 外文期刊>Computation >Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition
【24h】

Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

机译:多域语音情感识别的深层视觉属性与手工制作的音频功能

获取原文

摘要

Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms) etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN) functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.
机译:在许多与人机交互相关的应用程序中,或者在无法获得其他形式(例如视频或生理参数)的某些任务中,理解用户在某些任务中的情感状态时,语音情感识别可能会发挥关键作用。通常,可以使用多种方式来识别人的情绪,例如分析面部表情,语音,生理参数(例如脑电图,心电图)等。但是,测量这些方式可能很困难,麻烦或需要昂贵的硬件。在这种情况下,语音可能是许多实际应用中最好的替代形式。在这项工作中,我们提出一种使用卷积神经网络(CNN)作为视觉特征提取器并使用原始语音信息进行训练的方法。与传统的机器学习方法相比,CNN负责识别输入的重要特征,从而使手工完成的特征工程在许多任务中成为可选项。在本文中,除了频谱图表示形式之外,不需要其他特征,并且仅提取手工制作的特征是为了验证我们的方法。而且,它不需要任何语言模型,并且不特定于任何特定语言。我们使用跨语言数据集比较了所提出的方法,并证明了与使用手工功能的传统方法相比,它能够提供更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号