...
首页> 外文期刊>npj Digital Medicine >Natural language generation for electronic health records
【24h】

Natural language generation for electronic health records

机译:电子健康记录的自然语言生成

获取原文
           

摘要

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder–decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model’s optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.
机译:生物医学信息学的一个广泛目标是生成完全综合,忠实代表性的电子健康记录(EHR),以促进医疗保健提供者和研究人员之间的数据共享并促进方法学研究。现有多种生成合成EHR的方法,但是它们无法生成非结构化的文本,例如急诊科(ED)的主要投诉,当前病史或病历记录。在这里,我们使用编码器-解码器模型(一种在许多现代机器翻译系统中都具有的深度学习算法)从EHR中的离散变量(例如年龄组,性别和出院诊断)生成综合主要投诉。在对真实记录进行端到端训练之后,该模型可以生成现实的主要投诉文本,该文本似乎保留了原始记录句子对中编码的流行病学信息。作为模型优化目标的副作用,这些综合主诉也没有相对少见的缩写和拼写错误,并且它们均不包含训练数据中的任何个人身份信息(PII),表明可以使用此模型支持在电子病历中取消文本的标识。当与诸如生成对抗网络(GAN)之类的算法结合使用时,我们的模型可用于生成完全综合的EHR,从而使医疗保健提供者能够在不损害患者隐私的前提下共享多模式医疗数据的真实表示。这是一项重要的进步,我们希望这将促进用于生物医学信息学的临床决策支持,疾病监测以及其他需要大量数据的应用的机器学习方法的发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号