首页> 外文期刊>International Journal of Population Data Science >Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records
【24h】

Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records

机译:数据驱动框架的开发,可从链接的电子健康记录中自动识别患者队列

获取原文
           

摘要

ABSTRACTObjectives 1) To develop a fully data-driven framework for automatically identifying patients with a condition from routine electronic primary care records; 2) to identify informative codes (risk factors) of arthropathy conditions in primary care records that can accurately predict a diagnosis of the conditions in secondary care records. ApproachThis study linked routine primary and secondary care records in Wales, UK held in the SAIL (Secured Anonymised Information Linkage) databank, in which the secondary care records were used as golden standard. As such, we proposed to use machine learning techniques to extract patient information and identify cohorts with a condition from the large and high-dimensional linked dataset using the following phases: data preparation, performed in the machine learning context fashion; pre-selection of initial features, ranking and selecting features into a meaningful subset by using feature selection methods; and identification algorithm development which incorporates mechanisms of tackling the imbalanced nature of the data. This data-driven framework was then validated on an independent dataset, and compared with existing algorithm which had been developed using expert clinician knowledge for arthropathy conditions. ResultsRheumatoid arthritis (RA) and ankylosing spondylitis (AS) were used to demonstrate the feasibility of this framework. Linking primary care records with the secondary care rheumatology clinical system, we collected 9,657 patients with 1,484 RA patients and 204 AS patients. The proposed framework identified various compact subsets of informative features (risk factors) from 43,100 potential Read codes. Applying to an independent test data, this framework achieved the classification accuracy and positive predictive values (PPVs) of 86.19% and 88.46% respectively for RA and 99.23 % and 97.75% respectively for AS, which are comparable with the performance of clinical knowledge-based method - the accuracy of 85.85%, the PPV of 85.28% for RA and the accuracy of 97.86% , the PPV of 95.65% for AS. ConclusionThe proposed data-driven framework provides a rapid and cost-effective way of reliably identifying patients with a medical condition from primary care data. It performed as well as the clinically derived algorithm. This framework does not intend to substitute clinical expertise, instead it provides an decision support tool for clinicians during their decision process, in particular selection of patients for clinical trials.
机译:摘要目的1)建立一个完全由数据驱动的框架,以自动从常规电子初级保健记录中识别出患有疾病的患者; 2)在初级保健记录中识别出关节炎病情的信息代码(风险因素),可以准确预测二级保健记录中对病情的诊断。方法该研究将SAIL(安全匿名信息链接)数据库中保存的英国威尔士的常规初级和二级保健记录联系起来,其中二级保健记录被用作黄金标准。因此,我们建议使用机器学习技术来提取患者信息并使用以下阶段从大型和高维链接数据集中识别具有条件的队列:数据准备,以机器学习上下文的方式执行;通过使用特征选择方法预先选择初始特征,对特征进行排名并选择有意义的子集;识别算法的发展,其中包括解决数据不平衡性质的机制。然后,该数据驱动的框架在独立的数据集上进行了验证,并与使用专家临床知识为关节疾病所开发的现有算法进行了比较。结果风湿性关节炎(RA)和强直性脊柱炎(AS)被用来证明该框架的可行性。将初级保健记录与二级保健风湿病临床系统联系起来,我们收集了9,657例患者,其中1,484例RA患者和204例AS患者。拟议的框架从43,100个潜在的Read代码中识别了各种信息特征(风险因素)的紧凑子集。应用独立的测试数据,该框架对RA的分类准确度和阳性预测值(PPV)分别为RA,分别为86.19%和88.46%,对于AS分别为99.23%和97.75%,与基于临床知识的性能相当方法-RA的准确度为85.85%,PPV为85.28%,AS的准确度为97.86%,PPV为95.65%。结论提出的数据驱动框架提供了一种快速且经济高效的方式,可以从初级保健数据中可靠地识别出患有疾病的患者。它的性能与临床算法一样好。该框架无意替代临床专业知识,而是为临床医生在决策过程中,特别是为临床试验选择患者提供了决策支持工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号