Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

机译：BERT托尔净化在临床笔记上是否揭示了敏感数据？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT (Alsentzer et al., 2019). While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-Ⅲ corpus of EHR. However, more sophisticated "attacks" may succeed in doing so: To facilitate such research, we make our experimental setup and baseline probing models available.

机译：通过电子健康记录（EHR）的临床票据净化的大型变压器在预测临床任务中的表现提供了大量的收益。培训此类模型的成本（以及数据访问的必要性）与其实用程序耦合，即Prinicalbert（Alsentzer等，2019）的预磨模模型的释放虽然大多数努力使用了De intidified EHR，但许多研究人员可以访问大型敏感的，非直立的EHR，它们可能会训练BERT模型（或类似）。如果他们这样做是安全的那种模型的权重将是安全的吗？在这项工作中，我们设计了一种旨在从培训的伯特恢复个人健康信息（PHI）的方法。具体而言，我们试图恢复它们相关的患者姓名和条件。我们发现简单的探测方法无法将敏感的敏感信息从培训中有意义地提取ehr的模拟Ⅲ语料库。然而，更复杂的“攻击”可能成功地这样做：为了促进这种研究，我们制作了我们的实验设置和基线探测模型。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2021年|946-959|共14页
会议地点
作者
Eric Lehman; Sarthak Jain; Karl Pichotta; Yoav Goldberg; Byron C. Wallace;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research [J] . Wu Honghan, Toti Giulia, Morley Katherine I., Journal of the American Medical Informatics Association : . 2018,第5期

机译：Semehr：一种通用的语义搜索系统，用于从临床注意，用于量身定制的护理，试验招聘和临床研究
2. Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes: Supplementary Issue: Big Data Analytics for Health [J] . Kavita Ganesan, Shane Lloyd, Vikren Sarkar Biomedical Engineering and Computational Biology . 2016,第6期

机译：使用大量临床笔记发现相关的临床概念：补充问题：健康的大数据分析
3. Untargeted Lipidomics Reveals Differences in the Lipid Pattern among Clinical Isolates of Staphylococcus aureus Resistant and Sensitive to Antibiotics [J] . Hewelt-Belka Weronika, Nakonieczna Joanna, Belka Mariusz, Journal of proteome research . 2016,第3期

机译：未靶向的脂质组学揭示了耐药和对抗生素敏感的金黄色葡萄球菌临床分离株之间脂质模式的差异
4. Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT [C] . Aitor Garcia-Pablos, Naiara Perez, Montse Cuadros International Conference on Language Resources and Evaluation . 2020

机译：西班牙语临床文本中的敏感数据检测与分类：伯特的实验
5. Dictionary-Based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks [D] . Carthon, Mark, III. 2020

机译：用于状语释义任务的微调伯爵的文章生成
6. Cross-Sectional Analysis of Data from the U.S. Clinical Trials Database Reveals Poor Translational Clinical Trial Effort for Traumatic Brain Injury Compared with Stroke [O] . Lucia M. Li, David K. Menon, Tobias Janowitz -1

机译：来自美国临床试验数据库的数据的横断面分析显示与中风相比创伤性脑损伤的转化临床试验效果较差
7. Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media [O] . Xiang Dai, Sarvnaz Karimi, Ben Hachey, 2020

机译：经济型预测数据的成本效益：在社交媒体上预先曝光伯特的案例研究

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

摘要

著录项

相似文献

相关主题

期刊订阅