首页> 外文期刊>International Journal of Population Data Science >Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters
【24h】

Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters

机译:整理非结构化数据:一种自然语言处理方法,可从临床信函中提取丰富的数据

获取原文
       

摘要

ABSTRACT ObjectivesElectronic healthcare records (EHR) are the main data sources that facilitate epidemiology research. Routinely collected data such as primary and secondary care are now easily linked to produce novel and high impact research. There are, however, rich data locked in the free text of clinical letters that are not otherwise translated into EHRs. It is highly desirable to be able to extract this information to strengthen the body of information in existing EHRs. The Swansea Collaborative in Analysis of NLP Research (SCANR) group at Swansea University has been established to evaluate the usage of Natural Language Processing platforms for obtaining new clinical data. To use Clix Enrich to extract SNOMED concepts from a variety of clinical free texts and produce EHRs from the extraction process. Approach SNOMED concepts contain common items of interest such as diagnosis, medication and symptoms, as well as contextual concepts such as historical reference and negation. Clix Enrich uses the SNOMED dictionary to encode clinical free text (pre-co-ordinated) and find contextually correct SNOMED concepts (post co-ordinated). We used Clix Enrich to extract meaningful clinical terms from MS and Epilepsy consultant letters, as well as presenting complaint fields from a Welsh Emergency Department (ED). ResultsWe tailored Clix Enrich to extract a wide variety of clinical terms from each source (fourty texts per source) and validated the extraction accuracy with clinical experts in each domain. Clix Enrich was able to accurately extract the correct diagnosis for MS, Epilepsy and ED attendance (100%, 95% and 80%), dosage and frequency of anti-epileptic medication and MS modifying therapy (90%, 100%) and EDDS score (94%). We note a probable source of discrepancy in extraction accuracy between letter sources in the frequency of abbreviated terms, particularly within the presenting complaint field of the ED sample. ConclusionClix Enrich can be used to accurately extract SNOMED concepts from clinical letters. The resulting datasets are readily available to link to existing EHRs, and can be linked to EHRs that adopt the SNOMED coding structure, or backward compatible hierarchies. Clix Enrich comes with out-of-the-box extraction methods but the optimum way to extract the correct information would be to build in custom queries, thus requiring clinical expertise to validate extraction.
机译:摘要目标电子医疗记录(EHR)是促进流行病学研究的主要数据来源。现在可以轻松地将常规收集的数据(例如初级保健和二级保健)链接起来,以进行新颖而影响深远的研究。但是,临床信件的自由文本中锁定了丰富的数据,这些数据否则不会转换为EHR。非常需要能够提取此信息以增强现有EHR中的信息主体。斯旺西大学的斯旺西NLP研究分析合作组织(SCANR)已成立,以评估自然语言处理平台在获取新临床数据方面的使用情况。使用Clix Enrich从各种临床免费文本中提取SNOMED概念并从提取过程中产生EHR。方法SNOMED概念包含共同感兴趣的项目,例如诊断,药物和症状,以及上下文概念,例如历史参考和否定。 Clix Enrich使用SNOMED词典对临床自由文本进行编码(预先协调),并找到上下文正确的SNOMED概念(协调后)。我们使用Clix Enrich从MS和癫痫顾问信中提取有意义的临床术语,并从威尔士急诊科(ED)提出投诉领域。结果我们为Clix Enrich量身定制了从每个来源提取各种临床术语的信息(每个来源40个文本),并与每个领域的临床专家一起验证了提取准确性。 Clix Enrich能够准确地提取出有关MS,癫痫和ED出勤率(100%,95%和80%),抗癫痫药物和MS改良疗法的剂量和频率(90%,100%)和EDDS评分的正确诊断(94%)。我们注意到,在缩写词的频率上,尤其是在ED样本的提出投诉领域中,字母来源之间的提取准确性可能存在差异。结论Clix Enrich可用于从临床信函中准确提取SNOMED概念。生成的数据集易于链接到现有EHR,并且可以链接到采用SNOMED编码结构或向后兼容层次结构的EHR。 Clix Enrich提供了开箱即用的提取方法,但是提取正确信息的最佳方法是建立自定义查询,因此需要临床专家来验证提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号