首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Natural Language Processing Methods for Acoustic and Landmark Event-Based Features in Speech-Based Depression Detection
【24h】

Natural Language Processing Methods for Acoustic and Landmark Event-Based Features in Speech-Based Depression Detection

机译:基于语音的抑郁症检测中的声学和地标事件的自然语言处理方法

获取原文
获取原文并翻译 | 示例

摘要

The processing of speech as an explicit sequence of events is common in automatic speech recognition (linguistic events), but has received relatively little attention in paralinguistic speech classification despite its potential for characterizing broad acoustic event sequences. This paper proposes a framework for analyzing speech as a sequence of acoustic events, and investigates its application to depression detection. In this framework, acoustic space regions are tokenized to 'words' representing speech events at fixed or irregular intervals. This tokenization allows the exploitation of acoustic word features using proven natural language processing methods. A key advantage of this framework is its ability to accommodate heterogeneous event types: herein we combine acoustic words and speech landmarks, which are articulation-related speech events. Another advantage is the option to fuse such heterogeneous events at various levels, including the embedding level. Evaluation of the proposed framework on both controlled laboratory-grade supervised audio recordings as well as unsupervised self-administered smartphone recordings highlight the merits of the proposed framework across both datasets, with the proposed landmark-dependent acoustic words achieving improvements in F1(depressed) of up to 15% and 13% for SH2-FS and DAIC-WOZ respectively, relative to acoustic speech baseline approaches.
机译:作为一种显式事件序列的语音处理在自动语音识别(语言事件)中是常见的,但尽管它具有表征广泛的声学事件序列的可能性,但是在Paralinguistic语音分类中受到相对较少的关注。本文提出了一种用于分析语音作为声学事件序列的框架,并研究其在抑郁检测中的应用。在该框架中,声学空间区域被授予以固定或不规则间隔表示表示语音事件的“单词”。该标记化允许利用经过验证的自然语言处理方法利用声学字特征。该框架的一个关键优势是其容纳异构事件类型的能力:在此,我们结合了声音词和语音地标,这是关节相关的语音事件。另一个优点是选择在各种水平下融合这种异构事件,包括嵌入水平。评估对受控实验室级监督录音的拟议框架以及无监督的自我管理的智能手机录制突出了两个数据集中所提出的框架的优点,其中建议的地标相关的声学词实现了F1(抑郁)的改进相对于声学语音基线方法,SH2-FS和DAIC-WOZ分别高达15%和13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号