首页> 外文会议>International Conference on Reliability, Infocom Technologies and Optimization >Redaction of Protected Health Information in EHRs using CRFs and Bi-directional LSTMs
【24h】

Redaction of Protected Health Information in EHRs using CRFs and Bi-directional LSTMs

机译:使用CRFS和双向LSTMS在EHRS中重新缩略EHRS中的受保护的健康信息

获取原文

摘要

This paper describes the de-identification of personally identifiable information (PIIs) in electronic health records (EHRs) using two models of conditional random fields (CRFs) and bidirectional long short term memory networks (LSTMs). Most medical records store private information such as PATIENT NAME, HOSPITAL NAME, LOCATION, etc. that needs to be de-identified or redacted before being passed on for further medical research. The process of removing such information using machine learning techniques is started with pre-processing of raw data by tokenization and detection of sentences. On comparing the techniques, it is noted that CRFs require manual feature engineering to train the model whereas LSTM is capable of handling long term dependencies without much insight about the dataset. Bi-directional LSTM network was used to generate context information from suitable word representations. Finally, a predictive layer was applied to predict the protected health information (PHI) terms having maximum probability. Evaluated with the i2b2 gold data set of clinical narratives of patients of 2014 De-identification challenge, we propose an efficient solution for redaction using two models, both of which achieve good F-scores for PHIs of all types. The LSTM-based model achieved a micro-F1 measure of 0.9592, which performs better than the CRF-based model.
机译:本文介绍了使用两种条件随机字段(CRF)和双向长短短期存储网络(LSTMS)的两种模型来识别电子健康记录(EHRS)的个人识别信息(EHRS)。大多数医疗记录存储私人信息,如需要在进一步的医学研究中传递或重新签出或重新删除的患者姓名,医院名称,位置等。使用机器学习技术删除此类信息的过程是通过令叫声化和检测句子的原始数据预处理。在比较技术的情况下,应注意,CRFS需要手动功能工程来训练模型,而LSTM能够处理长期依赖性,而不是对数据集有很多洞察力。双向LSTM网络用于生成来自合适字表示的上下文信息。最后,应用预测层以预测具有最大概率的受保护的健康信息(PHI)术语。通过2014年患者的临床叙述评估了2014年去鉴定挑战的临床叙事,我们提出了一种使用两种模型进行重新加工的有效解决方案,这两种型号都可以实现所有类型的PHI的良好F分数。基于LSTM的模型实现了0.9592的微F1度量,其比基于CRF的模型更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号