首页> 外文会议>International workshop on language cognition and computational models >Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP
【24h】

Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP

机译:检测主题限制文本的抑郁痕迹:参加与NLP自我耻辱的抑郁症

获取原文
获取外文期刊封面目录资料

摘要

Natural language processing researchers have proven the ability of machine learning approaches to detect depression-related cues from language; however, to date, these efforts have primarily assumed it was acceptable to leave depression-related texts in the data. Our concerns with this are twofold: first, that the models may be overfitting on depression-related signals, which may not be present in all depressed users (only those who talk about depression on social media); and second, that these models would under-perform for users who are sensitive to the public stigma of depression. This study demonstrates the validity to those concerns. We construct a novel corpus of texts from 12,106 Reddit users and perform lexical and predictive analyses under two conditions: one where all text produced by the users is included and one where the depression-related posts are withheld. We find significant differences in the language used by depressed users under the two conditions as well as a difference in the ability of machine learning algorithms to correctly detect depression. However, despite the lexical differences and reduced classification performance-each of which suggests that users may be able to fool algorithms by avoiding direct discussion of depression-a still respectable overall performance suggests lexical models are reasonably robust and well suited for a role in a diagnostic or monitoring capacity.
机译:自然语言处理研究人员已经证明了机器学习方法从语言中检测抑郁相关的线索的能力;然而,迄今为止,这些努力主要假设将抑郁相关文本留在数据中是可以接受的。我们对此的担忧是双重的:首先,模型可能会在抑郁相关的信号上过度接受,这可能在所有郁闷的用户中不存在(只谈论社交媒体上的抑郁症);其次,这些模型对于对抑郁症公共耻辱敏感的用户来说将不足。本研究表明对这些问题的有效性。我们构建了来自12,106个Reddit用户的文本的新语料库,并在两个条件下执行词汇和预测分析:包括用户产生的所有文本的文章和预测分析,并且拒绝相关帖子的所有文本都被扣留。我们在两个条件下的抑郁用户使用的语言中发现了显着差异以及机器学习算法正确检测抑郁的能力差异。然而,尽管有词汇差异和减少的分类性能 - 每一个都表明用户可以通过避免对抑郁症的直接讨论来欺骗算法 - 仍然具有尊重的整体性能,表明模型具有合理强大,适合在诊断中的角色或监测能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号