首页> 外文学位 >Improving disease surveillance: Sentinel surveillance network design and novel uses of Wikipedia.
【24h】

Improving disease surveillance: Sentinel surveillance network design and novel uses of Wikipedia.

机译:改善疾病监测:前哨监测网络设计和Wikipedia的新颖用途。

获取原文
获取原文并翻译 | 示例

摘要

Traditional disease surveillance systems are instrumental in guiding policy-makers' decisions and understanding disease dynamics. The first study in this dissertation looks at sentinel surveillance network design. We consider three location-allocation models: two based on the maximal coverage model (MCM) and one based on the K-median model. The MCM selects sites that maximize the total number of people within a specified distance to the site. The K-median model minimizes the sum of the distances from each individual to the individual's nearest site. Using a ground truth dataset consisting of two million de-identified Medicaid billing records representing eight complete influenza seasons and an evaluation function based on the Huff spatial interaction model, we empirically compare networks against the existing volunteer-based Iowa Department of Public Health influenza-like illness network by simulating the spread of influenza across the state of Iowa. We compare networks on two metrics: outbreak intensity (i.e., disease burden) and outbreak timing (i.e., the start, peak, and end of the epidemic). We show that it is possible to design a network that achieves outbreak intensity performance identical to the status quo network using two fewer sites. We also show that if outbreak timing detection is of primary interest, it is actually possible to create a network that matches the existing network's performance using 42% fewer sites. Finally, in an effort to demonstrate the generic usefulness of these location-allocation models, we examine primary stroke center selection. We describe the ineffectiveness of the current self-initiated approach and argue for a more organized primary stroke center system.;While these traditional disease surveillance systems are important, they have several downsides. First, due to a complex reporting hierarchy, there is generally a reporting lag; for example, most diseases in the United States experience a reporting lag of approximately 1--2 weeks. Second, many regions of the world lack trustworthy or reliable data. As a result, there has been a surge of research looking at using publicly available data on the internet for disease surveillance purposes. The second and third studies in this dissertation analyze Wikipedia's viability in this sphere.;The first of these two studies looks at Wikipedia access logs. Hourly access logs dating back to December 2007 are available for anyone to download completely free of charge. These logs contain, among other things, the total number of accesses for every article in Wikipedia. Using a linear model and a simple article selection procedure, we show that it is possible to nowcast and, in some cases, forecast up to the 28 days tested in 8 of the 14 disease-location contexts considered. We also demonstrate that it may be possible in some cases to train a model in one context and use the same model to nowcast or forecast in another context with poor surveillance data.;The second of the Wikipedia studies looked at disease-relevant data found in the article content. A number of disease outbreaks are meticulously tracked on Wikipedia. Case counts, death counts, and hospitalization counts are often provided in the article narrative. Using a dataset created from 14 Wikipedia articles, we trained a named-entity recognizer (NER) to recognize and tag these phrases. The NER achieved an F1 score of 0.753. In addition to these counts in the narrative, we tested the accuracy of tabular data using the 2014 West African Ebola virus disease epidemic. This article, like a number of other disease articles on Wikipedia, contains granular case counts and deaths counts per country affected by the disease. By computing the root-mean-square error between the Wikipedia time series and a ground truth time series, we show that the Wikipedia time series are both timely and accurate.
机译:传统的疾病监测系统有助于指导决策者的决策和了解疾病动态。本文的第一项研究着眼于前哨监视网络的设计。我们考虑了三种位置分配模型:两种基于最大覆盖模型(MCM),一种基于K中值模型。 MCM选择的站点可在距站点指定距离内使总人数最大化。 K中值模型将每个人到该人最近的位置的距离之和最小化。我们使用由200万个代表八个完整流感季节的去身份确认的医疗补助账单记录和基于霍夫空间相互作用模型的评估功能组成的地面事实数据集,将网络与现有的基于志愿者的爱荷华州公共卫生部类似的流感进行经验比较通过模拟流感在爱荷华州的传播来建立疾病网络。我们在两个指标上比较网络:爆发强度(即疾病负担)和爆发时间(即流行的开始,高峰和结束)。我们表明,有可能设计出一个使用少于两个站点即可达到与现状网络相同的爆发强度性能的网络。我们还表明,如果首先考虑爆发时间检测,则实际上可以使用较少的42%的站点来创建与现有网络性能相匹配的网络。最后,为了证明这些位置分配模型的通用性,我们研究了主要笔画中心的选择。我们描述了当前自我启动方法的无效性,并主张建立更有组织的原发性中风中心系统。虽然这些传统疾病监测系统很重要,但它们也有一些缺点。首先,由于报告层次结构复杂,通常存在报告滞后;例如,在美国,大多数疾病的报告滞后时间约为1--2周。其次,世界上许多地区缺乏可信赖或可靠的数据。结果,涌现了大量研究,目的是使用互联网上的公共数据进行疾病监测。本文的第二和第三项研究分析了Wikipedia在该领域的可行性。这两项研究中的第一项研究了Wikipedia访问日志。任何人都可以免费下载2007年12月以来的每小时访问日志。这些日志除其他外,包含Wikipedia中每篇文章的访问总数。使用线性模型和简单的文章选择程序,我们表明可以在考虑的14个疾病定位环境中的8个中进行临近预报,并在某些情况下预测长达28天的预测。我们还证明,在某些情况下,有可能在一个环境中训练模型,并在监视数据不佳的情况下在另一个环境中使用同一模型进行即时预报或预测。;第二项Wikipedia研究着眼于与疾病相关的数据文章内容。在Wikipedia上精心跟踪了许多疾病暴发。文章叙述中经常提供病例数,死亡数和住院数。使用从14篇Wikipedia文章创建的数据集,我们训练了一个命名实体识别器(NER)来识别和标记这些短语。 NER的F1分数为0.753。除了叙述中的这些计数,我们还使用2014年西非埃博拉病毒病流行病测试了表格数据的准确性。像Wikipedia上的其他许多疾病文章一样,本文包含每个受该疾病影响的国家的详细病例数和死亡数。通过计算Wikipedia时间序列和地面真实时间序列之间的均方根误差,我们表明Wikipedia时间序列既及时又准确。

著录项

  • 作者

    Fairchild, Geoffrey Colin.;

  • 作者单位

    The University of Iowa.;

  • 授予单位 The University of Iowa.;
  • 学科 Computer science.;Epidemiology.;Web studies.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 151 p.
  • 总页数 151
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号