首页> 外文会议>Canadian conference on artificial intelligence >Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques
【24h】

Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

机译:使用基于知识的和提取性文本摘要技术从EMR中无监督地提取诊断代码

获取原文

摘要

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.
机译:从医疗记录中提取诊断代码以进行计费和报销,以及用于诸如质量控制和队列识别之类的辅助用途。在美国,这些代码来自于国际疾病分类(ICD)的标准术语ICD-9-CM。 ICD-9代码通常由受过训练的人类编码人员按照特定的编码指南,通过阅读患者病历中所有可用的工件来提取。为了帮助编码人员进行手动操作,本文提出了一种无监督的集成方法,可以从电子病历(EMR)中包含的文本叙述中自动提取ICD-9诊断代码。较早的自动提取尝试集中于单个文档,例如放射学报告和出院摘要。在这里,我们使用了更现实的数据集,并从肯塔基大学医学中心的1000例住院就诊的EMR中提取了ICD-9代码。使用命名实体识别(NER),基于图形的医学概念映射和提取文本摘要技术,我们实现了基于示例的平均召回率0.42和平均精度0.47;与仅使用NER的基线相比,我们注意到基于图的方法的查全率提高了12%,而使用提取文本摘要方法的查全率则提高了7%。尽管诊断代码是通常在文本中表达的复杂概念,并且具有很长的远程非本地依赖性,但我们目前的工作显示了在提取一部分代码时无监督方法的潜力。因此,我们的发现特别适用于难以获取大量训练数据的代码提取任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号