Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

机译：保密性文本表示的对抗性学习，以取消病历识别

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

De-identification is the task of detecting protected health information (PHI) in medical text. It is a critical step in sanitizing electronic health records (EHRs) to be shared for research. Automatic de-identification classifiers can significantly speed up the sanitization process. However, obtaining a large and diverse dataset to train such a classifier that works well across many types of medical text poses a challenge as privacy laws prohibit the sharing of raw medical records. We introduce a method to create privacy-preserving shareable representations of medical text (i.e. they contain no PHI) that does not require expensive manual pseudonymization. These representations can be shared between organizations to create unified datasets for training de-identification models. Our representation allows training a simple LSTM-CRF de-identification model to an F_(1) score of 97.4%, which is comparable to a strong baseline that exposes private information in its representation. A robust, widely available de-identification classifier based on our representation could potentially enable studies for which de-identification would otherwise be too costly.

机译：取消身份识别是检测医学文本中受保护的健康信息（PHI）的任务。这是消毒要共享以供研究的电子健康记录（EHR）的关键步骤。自动取消标识分类器可以大大加快清理过程。但是，由于隐私法禁止共享原始病历，因此获得一个庞大且多样化的数据集来训练可在多种类型的医学文本上正常工作的分类器带来了挑战。我们介绍一种无需昂贵的人工假名即可创建医疗文本的可保护隐私的可共享表示形式（即它们不包含PHI）的方法。可以在组织之间共享这些表示，以创建用于训练去标识模型的统一数据集。我们的表示法允许训练一个简单的LSTM-CRF去识别模型，使F_（1）得分达到97.4％，这与在其表示法中公开私人信息的强大基线相当。基于我们的表示的强大的，广泛可用的取消身份识别分类器可能潜在地使那些需要取消费用昂贵的研究成为可能。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|5829-5839|共11页
会议地点
作者
Max Friedrich; Arne Koehn; Gregor Wiedemann; Chris Biemann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using Biomedical Text as Data and Representation Learning for Identifying Patients with an Osteoarthritis Phenotype in the Electronic Medical Record [J] . Christopher Meaney, Jessica Widdifield, Liisa Jaakkimainen, International Journal of Population Data Science . 2018,第4期

机译：使用生物医学文本作为数据和表征学习来识别电子病历中的骨关节炎表型患者
2. Leveraging text skeleton for de-identification of electronic medical records [J] . Yue-Shu Zhao, Kun-Li Zhang, Hong-Chao Ma, BMC Medical Informatics and Decision Making . 2018,第1期

机译：利用文本框架来取消电子病历的识别
3. De-identification of primary care electronic medical records free-text data in Ontario, Canada [J] . Karen Tu, Julie Klein-Geltink, Tezeta F Mitiku, BMC Medical Informatics and Decision Making . 2010,第1期

机译：在加拿大安大略省取消对初级保健电子医疗记录自由文本数据的标识
4. Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records [C] . Max Friedrich, Arne Koehn, Gregor Wiedemann, Annual meeting of the Association for Computational Linguistics . 2019

机译：对隐私保留文本陈述的对抗学习，以便去识别病历
5. Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records [D] . Rios, Anthony. 2018

机译：用于多标签文本分类的深层神经网络：在电子病历编码中的应用
6. Leveraging text skeleton for de-identification of electronic medical records [O] . Yue-Shu Zhao, Kun-Li Zhang, Hong-Chao Ma, 2018

机译：利用文本框架来取消电子病历的识别
7. Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records [O] . Max Friedrich, Arne Köhn, Gregor Wiedemann, 2019

机译：对隐私保留文本陈述的对抗学习，以便去识别病历

Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅