Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

Li Cheng; Rana Santu; Dinh Phung; Venkatesh Svetha

首页> 外文期刊>Knowledge-Based Systems >Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

【24h】

Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

机译：从电子病历中发现知识的分层贝叶斯非参数模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. (C) 2016 Elsevier B.V. All rights reserved.

机译：电子病历（EMR）已将自己确立为大规模分析健康数据的宝贵资源。医院EMR数据集通常由住院患者的医疗记录组成。医疗记录包含诊断信息（诊断代码），执行的程序（过程代码）和入院详细信息。通过将患者视为文档，将诊断代码视为单词，可以使用传统的主题模型，例如潜在的Dirichlet分配（LDA）和分层Dirichlet过程（HDP），从EMR数据中发现疾病主题。该主题建模有助于了解患者疾病的构成，并提供了更好地规划治疗的工具。在本文中，我们提出了一种新颖且灵活的分层贝叶斯非参数模型，即单词距离相关的中餐馆连锁店（wddCRF），该模型结合了单词间的距离来发现语义一致的疾病主题。诊断代码以ICD-10树形结构的形式连接在一起，这一事实激发了我们的动机，ICD-10树形结构表示了代码之间的语义关系。我们利用衰减函数在wddCRF的最底层合并单词之间的距离。利用MCMC技术推导了wddCRF的有效推论。此外，由于程序代码通常与诊断代码相关，因此我们开发了对应的wddCRF（Corr-wddCRF），以探索给定疾病模式的程序代码的条件关系。为Corr-wddCRF导出了有效的折叠吉布斯采样。我们在两个现实世界的医学数据集-PolyVascular疾病和急性心肌梗塞疾病上评估了提出的模型。我们证明，Corr-wddCRF模型比Corr-HDP发现更多一致的主题。我们还将疾病主题比例用作新功能，并显示在14天的再入院预测中，使用Corr-wddCRF的功能要优于基线。除此之外，基于Corr-wddCRF的过程代码的预测也显示出相当高的准确性。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2016年第may1期|168-182|共15页
作者
Li Cheng; Rana Santu; Dinh Phung; Venkatesh Svetha;
展开▼
作者单位

Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia;

Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia;

Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia;

Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction;

机译：贝叶斯非参数模型;对应模型;词距;疾病主题;再入院预测;程序代码预测;

相似文献

外文文献
中文文献
专利

1. Biomedical Knowledge Discovery with Topological Constraints Modeling in Bayesian Networks: A Preliminary Report [J] . Guoliang Li, Tze-Yun Leong Studies in Health Technology and Informatics . 2007,第Pt1期

机译：贝叶斯网络中具有拓扑约束模型的生物医学知识发现：初步报告
2. Interpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records [J] . Westra Bonnie L., Dey Sanjoy, Fang Gang, Journal of healthcare engineering. . 2011,第1期

机译：可解释的预测模型从家庭护理电子病历中发现知识
3. Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models [J] . Terry Riopka Computing reviews . 2012,第12期

机译：使用非参数层次贝叶斯模型进行轨迹分析和语义区域建模
4. MKDS: A Medical Knowledge Discovery System Learned from Electronic Medical Records (Demonstration) [C] . Hen-Hsen Huang, An-Zi Yen, Hsin-Hsi Chen Asia information retrieval societies conference . 2018

机译：MKDS：从电子病历中学习的医学知识发现系统（示范）
5. Nonparametric Hierarchical Bayesian Models of Categorization [D] . Canini, Kevin Robert 2011

机译：非参数分级贝叶斯分类模型
6. Knowledge Discovery Using the Electronic Medical Record [O] . Adam Wilcox, George Hripcsak, Charles Knirsch 2002

机译：使用电子病历进行知识发现
7. Interpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records [O] . Bonnie L. Westra, Sanjoy Dey, Gang Fang, 2011

机译：家庭护理电子健康记录中可解释的知识发现预测模型

Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

摘要

著录项

相似文献

相关主题

期刊订阅