首页> 外文会议>International Conference on Artificial Intelligence in Medicine >Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes
【24h】

Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes

机译:自动乳腺癌队列从社交媒体侦查,用于研究影响患者以患者为中心的结果的因素

获取原文

摘要

Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations may be caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests, and are sparsely documented in electronic health records. Thus, there is a need to explore complementary sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource, but extracting true PCOs from it first requires the accurate detection of real breast cancer patients. We describe a natural language processing (NLP) pipeline for automatically detecting breast cancer patients from Twitter based on their self-reports. The pipeline uses breast cancer-related keywords to collect streaming data from Twitter, applies NLP patterns to filter out noisy posts, and then employs a machine learning classifier trained using manually-annotated data (n = 5,019) for distinguishing firsthand self-reports of breast cancer from other tweets. A classifier based on bidirectional encoder representations from transformers (BERT) showed human-like performance and achieved F_1-score of 0.857 (inter-annotator agreement: 0.845; Cohen's kappa) for the positive class, considerably outperforming the next best classifier-a recurrent neural network with bidirectional long short-term memory (Fi-score: 0.670). Qualitative analyses of posts from automatically-detected users revealed discussions about side effects, non-adherence and mental health conditions, illustrating the feasibility of our social media-based approach for studying breast cancer related PCOs from a large population.
机译:乳腺癌患者经常停止他们的长期治疗,例如激素治疗,增加癌症复发的风险。由于荷尔蒙药物副作用或其他因素,这些中断可能是由不良患者居中的结果(PCOS)引起的。 PCOS无法通过实验室测试可检测,并在电子健康记录中稀疏地记录。因此,需要探索与乳腺癌处理相关的PCOS的互补信息来源。社交媒体是一个有前途的资源,但从它中提取真正的PCOS需要准确地检测真正的乳腺癌患者。我们描述了一种自然语言处理(NLP)管道,用于根据自我报告自动检测来自Twitter的乳腺癌患者。管道使用乳腺癌相关的关键词来收集来自Twitter的流数据,应用NLP模式以滤除嘈杂的帖子,然后使用使用手动注释的数据训练的机器学习分类器(n = 5,019)来区分乳房的第一手自我报告来自其他推文的癌症。基于来自变压器(BERT)的双向编码器表示的分类器显示了人类的性能,并实现了0.857的F_1分数(间歇委员会协议:0.845; Cohen的Kappa),适用于正类,相当优于下一个最佳分类器 - 一种反复性神经具有双向短期内存的网络(FI-Score:0.670)。自动检测用户的帖子的定性分析揭示了关于副作用,非依从性和心理健康状况的讨论,说明了我们社会媒体基于乳腺癌相关PCOS从大群中研究了基于社会媒体的方法的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号