首页> 外文会议>Canadian conference on artificial intelligence >Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques
【24h】

Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

机译:使用基于知识和提取文本摘要技术的EMRS从EMRS的诊断代码提取

获取原文

摘要

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.
机译:从用于计费和报销的医疗记录中提取诊断代码,以及次要用途,如质量控制和队列识别。在美国,这些代码来自标准术语ICD-9-CM,来自于国际疾病的国际分类(ICD)。通过在特定编码指南下阅读患者的医疗记录中可用的所有伪像,通常通过培训的人工编码器提取ICD-9代码。为了协助编码器在本手册过程中,本文提出了无监督的集合方法,以自动提取来自电子医疗记录(EMRS)中包含的文本叙述的ICD-9诊断代码。早些时候对自动提取的尝试集中在放射学报告和排放摘要等个人文件上。在这里,我们使用更现实的数据集,并从肯塔基州医疗中心大学的1000次住院式访问的EMR中提取ICD-9代码。使用基于图形的实体识别(NER),基于图形的医学概念概念映射,以及提取文本摘要技术,我们实现了基于一个基于一个基于一个平均召回的平均召回0.42,平均精度为0.47;与仅使用Ner的基线相比,我们注意到基于图形的方法召回的12%改进,使用提取文本摘要方法对精度提高了7%。尽管诊断代码是复杂的概念,但经常以文本表达,虽然具有重要的长范围的非本地依赖性,但我们现在的作品显示了提取一部分代码的无监督方法的可能性。因此,我们的研究结果与代码提取任务特别相关,其中难以获得大量训练数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号