首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records
【24h】

Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

机译:保密性文本表示的对抗性学习,以取消病历识别

获取原文
获取外文期刊封面目录资料

摘要

De-identification is the task of detecting protected health information (PHI) in medical text. It is a critical step in sanitizing electronic health records (EHRs) to be shared for research. Automatic de-identification classifiers can significantly speed up the sanitization process. However, obtaining a large and diverse dataset to train such a classifier that works well across many types of medical text poses a challenge as privacy laws prohibit the sharing of raw medical records. We introduce a method to create privacy-preserving shareable representations of medical text (i.e. they contain no PHI) that does not require expensive manual pseudonymization. These representations can be shared between organizations to create unified datasets for training de-identification models. Our representation allows training a simple LSTM-CRF de-identification model to an F_(1) score of 97.4%, which is comparable to a strong baseline that exposes private information in its representation. A robust, widely available de-identification classifier based on our representation could potentially enable studies for which de-identification would otherwise be too costly.
机译:取消身份识别是检测医学文本中受保护的健康信息(PHI)的任务。这是消毒要共享以供研究的电子健康记录(EHR)的关键步骤。自动取消标识分类器可以大大加快清理过程。但是,由于隐私法禁止共享原始病历,因此获得一个庞大且多样化的数据集来训练可在多种类型的医学文本上正常工作的分类器带来了挑战。我们介绍一种无需昂贵的人工假名即可创建医疗文本的可保护隐私的可共享表示形式(即它们不包含PHI)的方法。可以在组织之间共享这些表示,以创建用于训练去标识模型的统一数据集。我们的表示法允许训练一个简单的LSTM-CRF去识别模型,使F_(1)得分达到97.4%,这与在其表示法中公开私人信息的强大基线相当。基于我们的表示的强大的,广泛可用的取消身份识别分类器可能潜在地使那些需要取消费用昂贵的研究成为可能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号