首页> 美国卫生研究院文献>other >Comparison of Machine Learning Classifiers for Influenza Detection from Emergency Department Free-text Reports
【2h】

Comparison of Machine Learning Classifiers for Influenza Detection from Emergency Department Free-text Reports

机译:从急诊科自由文本报告中检测流感的机器学习分类器的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We randomly identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1,620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.
机译:流感是一种每年复发的疾病,有可能成为大流行病。需要有效的生物监视系统来早期发现疾病。在我们以前的研究中,我们已经表明电子急诊室(ED)的自由文本报告对于实时改进流感检测具有重要意义。本文研究了七个用于流感检测的机器学习(ML)分类器,将其诊断能力与专家构建的流感贝叶斯分类器进行了比较,并评估了处理自由文本报告中缺少的临床信息的不同方法。我们从2008年至2011年之间随机鉴定了4家医院的31268份ED报告,以形成两个不同的数据集:培训(468例,29,004例对照)和测试(176例和1,620例对照)。我们使用自然语言处理(NLP)工具Topaz提取与流感相关的发现并将其编码为以下三个值之一:“急性”,“非急性”和“缺失”。结果表明,所有ML分类器的ROC(AUC)范围均在0.88至0.93之间,其性能明显优于专家建立的贝叶斯模型。与未分配缺失值(完全随机缺失)的配置相比,标记为缺失值(随机缺失)的临床信息在3个(4个)ML分类器中的性能得到了持续改善。考虑到大量的培训案例,案例/控制比率不影响分类性能。我们的研究表明ED报告结合ML和NLP的使用以及对缺失值信息的处理对于检测传染病具有巨大的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号