...
首页> 外文期刊>Affective Computing, IEEE Transactions on >Integrating Deep and Shallow Models for Multi-Modal Depression Analysis—Hybrid Architectures
【24h】

Integrating Deep and Shallow Models for Multi-Modal Depression Analysis—Hybrid Architectures

机译:为多模态凹陷分析 - 混合架构集成深层浅模型

获取原文
获取原文并翻译 | 示例

摘要

At present, although great progress has been made in automatic depression assessment, most of the recent works only concern the audio and video paralinguistic information, rather than the linguistic information from the spoken content. In this work, we argue that beside developing good audio and video features, to build reliable depression detection systems, text-based content features are also of importance to analyse depression-related textual indicators. Furthermore, to improve the performance of automatic depression assessment systems, powerful models, capable of modelling the characteristics of depression embedded in the audio, visual and text descriptors, are also required. This paper proposes new text and video features and hybridizes deep and shallow models for depression estimation and classification from audio, video and text descriptors. The proposed hybrid framework consists of three main parts: 1) A Deep Convolutional Neural Network (DCNN) and Deep Neural Network (DNN) based audio-visual multi-modal depression recognition model for estimating the Patient Health Questionnaire depression scale (PHQ-8); 2) A Paragraph Vector (PV) and Support Vector Machine (SVM) based model for inferring the physical and mental conditions of the individual from the transcripts of the interview; 3) A Random Forest (RF) model for depression classification from the estimated PHQ-8 score and the inferred conditions of the individual. In the PV-SVM model, PV embedding is used to obtain fixed-length feature vectors from transcripts of the answers to the questions associated with psychoanalytic aspects of depression, which are subsequently fed into the SVM classifiers for detecting the presence/absence of the considered psychoanalytic symptoms. To our best knowledge, this approach is the first attempt to apply PV for depression analysis. Besides, we propose a new visual descriptor - Histogram of Displacement Range (HDR) to characterize the displacement and velocity of the facial landmarks in the video segment. Experiments have been carried out on the Audio Visual Emotion Challenge (AVEC2016) depression dataset, they demonstrate that: 1) The proposed hybrid framework effectively improves the accuracies of both depression estimation and depression classification, with an average F1 measure up to 0.746, which is higher than the best result (0.724) of the depression sub-challenge of AVEC2016. 2) HDR obtains better depression recognition performance than Bag-of-Words (BoW) and Motion History Histogram (MHH) features.
机译:目前,虽然在自动抑制评估中取得了巨大进展,但最近的大多数作品只关注了音频和视频预测信息,而不是来自口头内容的语言信息。在这项工作中,我们认为,除了开发良好的音频和视频特征外,建立可靠的抑郁检测系统,基于文本的内容特征也重要性来分析抑郁相关的文本指标。此外,还需要改善自动抑郁评估系统的性能,强大的模型,能够建模嵌入音频,视觉和文本描述符中的抑郁特性。本文提出了新的文本和视频特征,并与音频,视频和文本描述符进行了抑郁估计和分类的深层和浅模范。所提出的混合框架由三个主要部分组成:1)深度卷积神经网络(DCNN)和基于深神经网络(DNN)的视听多模态抑郁识别模型,用于估算患者健康问卷抑郁尺度(PHQ-8) ; 2)基于段落向量(PV)和支持向量机(SVM)的模型,用于推断出访谈的成绩单中个体的身体和精神状况; 3)从估计的PHQ-8分数和个人推断的抑郁分类的随机森林(RF)模型。在PV-SVM模型中,PV嵌入用于从与抑郁症的精神分析方面相关的问题的答案的转录物获取固定长度特征向量,随后将被送入SVM分类器以检测所考虑的存在/不存在精神分析症状。为了我们的最佳知识,这种方法是第一次应用PV进行抑郁分析。此外,我们提出了一种新的视觉描述符 - 位移范围(HDR)的直方图,以表征视频段中面部地标的位移和速度。实验已经在视听情感挑战(AVEC2016)抑郁数据集上进行了实验,他们证明:1)所提出的混合框架有效提高了抑郁估计和抑郁分类的准确性,平均F1测量可达0.746,即高于AVEC2016的抑郁症次挑战的最佳结果(0.724)。 2)HDR获得比单词袋(弓)和运动历史直方图(MHH)功能的更好的抑郁识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号