首页> 外文期刊>JMIR public health and surveillance. >Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study
【24h】

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study

机译:社交媒体,综合征监测和微生物急性呼吸道感染数据的比较:观察研究

获取原文
       

摘要

Background: Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective: This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods: This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results: Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P =.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P ≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P ≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P =.19). Conclusions: To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.
机译:背景:互联网数据可用于改善传染病模型。然而,互联网派对措施的代表性和个人级别有效性在很大程度上是未开发的,因为这需要基础真实数据进行研究。目的:本研究试图使用基于实践,识别基于网络行为与/或对话主题和健康状态的关系,使用基于实践进行了调查的数据集。方法:本研究利用来自同一个人的自我报告的调查,微生物实验室测试和社交媒体数据的独特数据集,了解社交媒体数据中具有流感样疾病的个体级别构建的有效性。 Logistic回归模型用于使用用户发布行为和主题模型功能从用户推文中提取的帖子中识别Twitter Post中的疾病。结果:396个原始研究参与者,只有81次符合本研究的纳入标准。在这些参与者的推文中,我们只确定了与健康有关的两个实例,并在调查中的2周(之前或之后)在调查表明症状。如果参与者报告使用从主题模型的功能(曲线[AUC]下的区域= 0.51; p = .38)的特征何时预测= 0.53;p≤001)。个别症状也是通常不可预测的。来自Twitter的研究样本和随机样品在保持数据(AUC = 0.67;P≤001)上可预见到不同的不同,这意味着参与本研究的人发布的内容可预测地与随机推特用户发布的那些。随机样品中的个体和使用具有类似频率的Twitter(类似的@提到,推文数量和转发的数量; AUC = 0.50; p = .19)。结论:对于我们的知识,这是第一个尝试使用地面真实数据集进行验证社交媒体数据中的传染病观察的第一个实例。缺乏信号,行为或主题的可预测性缺乏可预测性,以及研究人群中所示的志愿者偏见是使用互联网源数据的疾病监测的大和不断增长的体内的重要发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号