首页> 美国卫生研究院文献>other >Automated disease cohort selection using word embeddings from Electronic Health Records
【2h】

Automated disease cohort selection using word embeddings from Electronic Health Records

机译:使用电子病历中的单词嵌入自动选择疾病队列

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Accurate and robust cohort definition is critical to biomedical discovery using Electronic Health Records (EHR). Similar to prospective study designs, high quality EHR-based research requires rigorous selection criteria to designate case/control status particular to each disease. Electronic phenotyping algorithms, which are manually built and validated per disease, have been successful in filling this need. However, these approaches are time-consuming, leading to only a relatively small amount of algorithms for diseases developed. Methodologies that automatically learn features from EHRs have been used for cohort selection as well. To date, however, there has been no systematic analysis of how these methods perform against current gold standards. Accordingly, this paper compares the performance of a state-of-the-art automated feature learning method to extracting research-grade cohorts for five diseases against their established electronic phenotyping algorithms. In particular, we use word2vec to create unsupervised embeddings of the phenotype space within an EHR system. Using medical concepts as a query, we then rank patients by their proximity in the embedding space and automatically extract putative disease cohorts via a distance threshold. Experimental evaluation shows promising results with average F-score of 0.57 and AUC-ROC of 0.98. However, we noticed that results varied considerably between diseases, thus necessitating further investigation and/or phenotype-specific refinement of the approach before being readily deployed across all diseases.
机译:准确,可靠的队列定义对于使用电子病历(EHR)进行生物医学发现至关重要。与前瞻性研究设计相似,高质量的基于EHR的研究需要严格的选择标准来指定每种疾病所特有的病例/控制状态。电子表型算法是针对每种疾病手动构建和验证的,已成功满足了这一需求。但是,这些方法很耗时,导致只开发了相对较少的疾病算法。自动从EHR中学习特征的方法也已用于队列选择。但是,到目前为止,还没有系统地分析这些方法如何根据当前的黄金标准执行。因此,本文将最先进的自动特征学习方法的性能与针对其建立的电子表型算法的五种疾病的研究级别队列进行比较。特别是,我们使用word2vec在EHR系统中创建表型空间的无监督嵌入。使用医学概念作为查询,然后我们根据患者在嵌入空间中的接近程度对其进行排名,并通过距离阈值自动提取假定的疾病队列。实验评估显示出令人鼓舞的结果,平均F分数为0.57,AUC-ROC为0.98。但是,我们注意到,不同疾病之间的结果差异很大,因此有必要进一步研究和/或对表型进行特定的改进,然后才能轻松部署到所有疾病中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号