Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records

Fabiola Fernández-Gutiérrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Ernest Choy; Sinead Brophy; Shang-Ming Zhou

首页> 外文期刊>International Journal of Population Data Science >Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records

【24h】

Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records

机译：数据驱动框架的开发，可从链接的电子健康记录中自动识别患者队列

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

ABSTRACTObjectives 1) To develop a fully data-driven framework for automatically identifying patients with a condition from routine electronic primary care records; 2) to identify informative codes (risk factors) of arthropathy conditions in primary care records that can accurately predict a diagnosis of the conditions in secondary care records. ApproachThis study linked routine primary and secondary care records in Wales, UK held in the SAIL (Secured Anonymised Information Linkage) databank, in which the secondary care records were used as golden standard. As such, we proposed to use machine learning techniques to extract patient information and identify cohorts with a condition from the large and high-dimensional linked dataset using the following phases: data preparation, performed in the machine learning context fashion; pre-selection of initial features, ranking and selecting features into a meaningful subset by using feature selection methods; and identification algorithm development which incorporates mechanisms of tackling the imbalanced nature of the data. This data-driven framework was then validated on an independent dataset, and compared with existing algorithm which had been developed using expert clinician knowledge for arthropathy conditions. ResultsRheumatoid arthritis (RA) and ankylosing spondylitis (AS) were used to demonstrate the feasibility of this framework. Linking primary care records with the secondary care rheumatology clinical system, we collected 9,657 patients with 1,484 RA patients and 204 AS patients. The proposed framework identified various compact subsets of informative features (risk factors) from 43,100 potential Read codes. Applying to an independent test data, this framework achieved the classification accuracy and positive predictive values (PPVs) of 86.19% and 88.46% respectively for RA and 99.23 % and 97.75% respectively for AS, which are comparable with the performance of clinical knowledge-based method - the accuracy of 85.85%, the PPV of 85.28% for RA and the accuracy of 97.86% , the PPV of 95.65% for AS. ConclusionThe proposed data-driven framework provides a rapid and cost-effective way of reliably identifying patients with a medical condition from primary care data. It performed as well as the clinically derived algorithm. This framework does not intend to substitute clinical expertise, instead it provides an decision support tool for clinicians during their decision process, in particular selection of patients for clinical trials.

机译：摘要目的1）建立一个完全由数据驱动的框架，以自动从常规电子初级保健记录中识别出患有疾病的患者; 2）在初级保健记录中识别出关节炎病情的信息代码（风险因素），可以准确预测二级保健记录中对病情的诊断。方法该研究将SAIL（安全匿名信息链接）数据库中保存的英国威尔士的常规初级和二级保健记录联系起来，其中二级保健记录被用作黄金标准。因此，我们建议使用机器学习技术来提取患者信息并使用以下阶段从大型和高维链接数据集中识别具有条件的队列：数据准备，以机器学习上下文的方式执行;通过使用特征选择方法预先选择初始特征，对特征进行排名并选择有意义的子集;识别算法的发展，其中包括解决数据不平衡性质的机制。然后，该数据驱动的框架在独立的数据集上进行了验证，并与使用专家临床知识为关节疾病所开发的现有算法进行了比较。结果风湿性关节炎（RA）和强直性脊柱炎（AS）被用来证明该框架的可行性。将初级保健记录与二级保健风湿病临床系统联系起来，我们收集了9,657例患者，其中1,484例RA患者和204例AS患者。拟议的框架从43,100个潜在的Read代码中识别了各种信息特征（风险因素）的紧凑子集。应用独立的测试数据，该框架对RA的分类准确度和阳性预测值（PPV）分别为RA，分别为86.19％和88.46％，对于AS分别为99.23％和97.75％，与基于临床知识的性能相当方法-RA的准确度为85.85％，PPV为85.28％，AS的准确度为97.86％，PPV为95.65％。结论提出的数据驱动框架提供了一种快速且经济高效的方式，可以从初级保健数据中可靠地识别出患有疾病的患者。它的性能与临床算法一样好。该框架无意替代临床专业知识，而是为临床医生在决策过程中，特别是为临床试验选择患者提供了决策支持工具。

著录项

来源
《International Journal of Population Data Science》 |2017年第1期|共页
作者
Fabiola Fernández-Gutiérrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Ernest Choy; Sinead Brophy; Shang-Ming Zhou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类经济;
关键词

相似文献

外文文献
中文文献
专利

1. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study [J] . Kristin M. Corey, Sehj Kashyap, Elizabeth Lorenzi, PLoS Medicine . 2018,第11期

机译：使用自动整理的电子健康记录数据（Pythia）开发和验证机器学习模型，以识别高危手术患者：回顾性，单点研究
2. Development of a multi-institutional cohort to facilitate cardiovascular disease biomarker validation using existing biorepository samples linked to electronic health records [J] . CrossD.S., McCartyC.A., SteinhublS.R., Clinical cardiology. . 2013,第8期

机译：建立多机构队列，以使用与电子健康记录相关的现有生物存储库样本，促进心血管疾病生物标志物的验证
3. Development of a Multi-institutional Cohort to Facilitate Cardiovascular Disease Biomarker Validation Using Existing Biorepository Samples Linked to Electronic Health Records [J] . Deanna S. Cross PhD, Catherine A. McCarty PhD MPH, Steven R. Steinhubl MD, Clinical cardiology. . 2013,第8期

机译：利用现有的生物存储库样本与电子健康记录相关联，以促进心血管疾病生物标志物验证的多机构队列的开发
4. FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Records [C] . Dimitrios Karapiperis, Aris Gkoulalas-Divanis, Vassilios S. Verykios IEEE International Smart Cities Conference . 2018

机译：FEMRL：用于患者电子健康记录的大规模隐私保护链接的框架
5. Design and evaluation of an associative classification framework to identify disease cohorts in the electronic health record. [D] . Welch, Susan Rea. 2011

机译：设计和评估关联分类框架，以识别电子健康记录中的疾病队列。
6. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective single-site study [O] . Kristin M. Corey, Sehj Kashyap, Elizabeth Lorenzi, 2018

机译：使用自动整理的电子健康记录数据（Pythia）开发和验证机器学习模型以识别高危手术患者：回顾性单点研究
7. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study [O] . Kristin M. Corey, Sehj Kashyap, Elizabeth Lorenzi, 2018

机译：机器学习模型的开发和验证识别高危手术患者使用自动策划电子健康记录数据（Pythia）：回顾性，单现场研究
8. Identifying Patients for Clinical Studies from Electronic Health Records: TREC 2012 Medical Records Track at OHSU. [R] . Bedrick, S., Edinger, T., Cohen, A., 2012

机译：从电子健康记录中识别临床研究患者：TREC 2012医学记录在OHsU进行跟踪。

Development of data-driven framework for automatically identifying patient cohorts from linked electronic health records

摘要

著录项

相似文献

相关主题

期刊订阅