首页> 外文会议>Chinese Automation Congress >Improvement on Speech Depression Recognition Based on Deep Networks

Improvement on Speech Depression Recognition Based on Deep Networks




To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.
机译:为了减轻诊断大量抑郁症状的临床医生的负担,人工智能研究人员对设计用于抑郁的自动识别系统越来越感兴趣。抑郁症患者的语音信号与正常人不同。在这里,我们介绍了一个深层模型Depression AudioNet,该模型对声道中与抑郁相关的特征进行编码,并提供更全面的音频表示。首先,从原始音频数据中提取梅尔频率倒谱系数(MFCC)。其次,通过多尺度音频三角洲归一化(MADN)获得了鲁棒的情绪特征,这是我们提出的一种数据处理算法。最后,MFCC和本地音频的两个相邻段的情感特征又被馈送到Depression AudioNet中以训练网络。该方法通过增加样本的长度信息而不减少样本的数量,解决了训练数据少,精度低的问题。在AVEC2014数据集上进行了实验,结果表明该方法比现有的语音抑郁识别算法更有效,更准确。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号