【24h】

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

机译:BERT托尔净化在临床笔记上是否揭示了敏感数据?

获取原文

摘要

Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT (Alsentzer et al., 2019). While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-Ⅲ corpus of EHR. However, more sophisticated "attacks" may succeed in doing so: To facilitate such research, we make our experimental setup and baseline probing models available.
机译:通过电子健康记录(EHR)的临床票据净化的大型变压器在预测临床任务中的表现提供了大量的收益。培训此类模型的成本(以及数据访问的必要性)与其实用程序耦合,即Prinicalbert(Alsentzer等,2019)的预磨模模型的释放虽然大多数努力使用了De intidified EHR,但许多研究人员可以访问大型敏感的,非直立的EHR,它们可能会训练BERT模型(或类似)。如果他们这样做是安全的那种模型的权重将是安全的吗?在这项工作中,我们设计了一种旨在从培训的伯特恢复个人健康信息(PHI)的方法。具体而言,我们试图恢复它们相关的患者姓名和条件。我们发现简单的探测方法无法将敏感的敏感信息从培训中有意义地提取ehr的模拟Ⅲ语料库。然而,更复杂的“攻击”可能成功地这样做:为了促进这种研究,我们制作了我们的实验设置和基线探测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号