首页> 外文期刊>JMIR Medical Informatics >Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
【24h】

Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

机译:基于微博的抑郁域词典自动构建:文本挖掘研究

获取原文
           

摘要

Background According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. Objective The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. Methods We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. Results The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. Conclusions Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.
机译:背景技术根据2017年的世界卫生组织报告,中国每20人中几乎有一个患有抑郁症的患者。然而,由于观察缓慢,高成本和患者抗性,抑郁症的诊断通常难以临床检测。同时,随着社交网站的快速出现,人们倾向于分享日常生活并经常披露内心的感受,使得可以使用丰富的文本信息有效地识别心理条件。关于英国基于Web的语料库有许多成就,但到目前为止,在中国的研究中,网络相关抑郁信号的语言特征仍处于相对初级的阶段。目的是本研究的目的是提出一种构建抑郁域词典的有效方法。此词典将包含语言功能,可以帮助识别可能抑郁症的社交媒体用户。我们的研究还比较了检测的性能,没有我们的词典。方法我们使用Word2VEC,语义关系图和标签传播算法自动抵御凹陷域词典。这两种方法在施工期间在特定的语料库中结合良好。基于来自1868名用户的111,052微博微博,获得了111,052微博微博。在抑郁检测期间,我们考虑了六个功能,我们使用了五种分类方法来测试检测性能。结果实验结果表明,就F1值而言,我们的自电共振建筑方法比基线方法优于基线方法,更有效和更具效率。当应用于Logistic回归和支持向量机等检测模型时,我们的词典帮助模型优于2%至9%,并且能够提高潜在抑郁检测的最终精度。结论我们的抑郁域Lexicon被证明是对分类算法的有意义的输入,为试验科目的抑郁状态提供语言洞察。我们认为,这种词典将增强社交媒体人民的早期抑郁检测。未来的工作需要在更大的语料库中进行,并以更复杂的方法进行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号