...
首页> 外文期刊>PLoS One >Neighborhood level chronic respiratory disease prevalence estimation using search query data
【24h】

Neighborhood level chronic respiratory disease prevalence estimation using search query data

机译:邻域水平慢性呼吸道疾病患病率估计使用搜索查询数据

获取原文
           

摘要

Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD.
机译:估计亚城区群落规模的疾病流行允许早期和有针对性的干预措施,可以帮助拯救生命并减少公共卫生负担。然而,高度局部数据收集和代表性信号的稀疏性的成本禁止性使得识别疾病的邻域规模普遍性的挑战。为了克服这一挑战,我们利用替代数据来源,这既不稀疏和代表局部疾病普遍存在的代表:使用来自大型商业搜索引擎的查询数据,我们识别美国呼吸道疾病的患病率,本地化到人口普查的地理粒度。专注于哮喘和慢性阻塞性肺病(COPD),我们构建了一系列特征,基于搜索症状,药物和疾病相关信息,并利用这些特征,并利用这些特征,并使用这些来识别超过500个城市的超过23000个小型的疾病率美国。单独从搜索数据的估计与从CDC的地面真理疾病估计值相关,以0.69至0.76,这些模型将这些相关性提高到高达0.84的简单补充。然后,我们显示在实践中,可以将数据添加到其他相关数据(如人口普查或陆地覆盖数据),以提高结果,该模型将所有数据源包含在0.91的哮喘和0.88以0.91相关的所有数据源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号