首页> 外文期刊>Knowledge-Based Systems >Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records
【24h】

Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

机译:从电子病历中发现知识的分层贝叶斯非参数模型

获取原文
获取原文并翻译 | 示例
           

摘要

Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.
机译:电子病历(EMR)已将自己确立为大规模分析健康数据的宝贵资源。医院EMR数据集通常由住院患者的医疗记录组成。医疗记录包含诊断信息(诊断代码),执行的程序(过程代码)和入院详细信息。通过将患者视为文档,将诊断代码视为单词,可以使用传统的主题模型,例如潜在的Dirichlet分配(LDA)和分层Dirichlet过程(HDP),从EMR数据中发现疾病主题。该主题建模有助于了解患者疾病的构成,并提供了更好地规划治疗的工具。在本文中,我们提出了一种新颖且灵活的分层贝叶斯非参数模型,即单词距离相关的中餐馆连锁店(wddCRF),该模型结合了单词间的距离来发现语义一致的疾病主题。诊断代码以ICD-10树形结构的形式连接在一起,这一事实激发了我们的动机,ICD-10树形结构表示了代码之间的语义关系。我们利用衰减函数在wddCRF的最底层合并单词之间的距离。利用MCMC技术推导了wddCRF的有效推论。此外,由于程序代码通常与诊断代码相关,因此我们开发了对应的wddCRF(Corr-wddCRF),以探索给定疾病模式的程序代码的条件关系。为Corr-wddCRF导出了有效的折叠吉布斯采样。我们在两个现实世界的医学数据集-PolyVascular疾病和急性心肌梗塞疾病上评估了提出的模型。我们证明,Corr-wddCRF模型比Corr-HDP发现更多一致的主题。我们还将疾病主题比例用作新功能,并显示在14天的再入院预测中,使用Corr-wddCRF的功能要优于基线。除此之外,基于Corr-wddCRF的过程代码的预测也显示出相当高的准确性。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号