A computational model to protect patient data from location-based re-identification

Bradley Malin

首页> 外文期刊>Artificial intelligence in medicine >A computational model to protect patient data from location-based re-identification

【24h】

A computational model to protect patient data from location-based re-identification

机译：一种计算模型，可保护患者数据免受基于位置的重新识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Objective: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or "trail", can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trail re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trail re-identification.rnMethods and materials: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the trails of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records. Results: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate fc-unlinkability. Rather, only portions of the trails need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naive solution would suppress the 52% of the DNA records that violate 5-unlinkability.rnHowever, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied.rnConclusion: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient-specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly.

机译：目标：医疗保健组织在披露个人数据时必须保留患者的匿名性。传统上，通过从敏感数据（例如DNA）中剥离标识符来保护患者身份。但是，简单的自动化方法可以使用公共信息重新识别患者数据。在本文中，我们提出了一种解决方案，以防止多个医疗保健组织公开数据时对患者匿名性的威胁。在这种情况下，患者的位置访问模式或“线索”可以重新识别看似匿名的DNA来识别患者身份。之所以存在这种威胁，是因为医疗保健组织（1）无法阻止某些类型的患者信息的泄露，并且（2）不知道如何系统地避免对线索进行重新识别。在本文中，我们开发和评估了计算方法，医疗机构可以使用这些计算方法来公开特定于患者的DNA记录，这些记录对于追踪重新识别而言是不可或缺的。方法和材料：为防止追踪重新识别，我们引入了一种称为k的正式模型-unlinkability，使医疗保健管理员可以指定不同程度的患者匿名性。具体地，当每个DNA记录的尾迹可链接至不少于k个已识别记录时，满足k不可链接性。我们提出了几种算法，可使医疗保健组织协调其数据公开，以便他们可以确定可以共享哪些DNA记录而不会违反k-不可链接性。我们根据从公开的医院出院数据库中得出的患者人数轨迹评估算法。使用基于现实世界应用程序的指标来评估算法效力，包括被禁止的记录数和公开记录的组织数。结果：我们的实验表明，不必删除所有最初违反fc-unlinkability的患者记录。而是仅需要抑制部分路径。例如，如果每个医院都将100％的数据披露给诊断为囊性纤维化的患者，那么48％的DNA记录是5个不可链接的。天真的解决方案可以抑制52％的DNA记录违反5不可链接性。然而，通过应用我们的保护算法，医院可以披露95％的DNA记录，而所有DNA记录都是5不可链接的。结论：这项研究表明，患者匿名可以在共享数据库中得到正式保护。我们的发现表明，可以公开披露大量的患者特定数据，并提供可靠的保护，以防止重新识别线索。我们方法的可配置性使医疗保健管理员可以量化不同级别的隐私保护的影响，并据此制定政策。

著录项

来源
《Artificial intelligence in medicine》 |2007年第3期|223-239|共17页
作者
Bradley Malin;
展开▼
作者单位

Department of Biomedicai Informatics, Eskind Biomedical Library, Fourth Floor, 2209 Garland Avenue, Vanderbilt University, Nashville, TN 37232-8340, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类医用一般科学;
关键词
privacy; confidentiality; genomics; databases; electronic medical records; distributed systems; graphical models;

机译：隐私;保密;基因组学数据库;电子病历;分布式系统;图形模型;

相似文献

外文文献
中文文献
专利

1. Evaluating Re-Identification Risks of Data Protected by Additive Data Perturbation [J] . Han Li, Krishnamurty Muralidhar, Rathindra Sarathy, Journal of database management . 2014,第2期

机译：评估受附加数据扰动保护的数据的重新识别风险
2. Data Re-Identification: Protect the Children [J] . DAVID GURWITZ Science . 2013,第6123期

机译：数据重新识别：保护儿童
3. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. [J] . Malin B, Sweeney L Journal of biomedical informatics. . 2004,第3期

机译：如何（不）保护分布式网络中的基因组数据隐私：使用路径重新标识来评估和设计匿名保护系统。
4. An Improved Differential Privacy Algorithm to Protect Re-identification of Data [C] . A. N. K. Zaman, Charlie Obimbo, Rozita A. Dara IEEE Canada International Humanitarian Technology Conference . 2017

机译：一种改进的差分隐私算法来保护数据重新识别
5. Part I: An artificial neural network model of tuberculosis patient data Part II: A DFT computational model of metal hydrides. [D] . Griffin, William O. 2012

机译：第一部分：结核病患者数据的人工神经网络模型第二部分：金属氢化物的DFT计算模型。
6. Estimating the success of re-identifications in incomplete datasets using generative models [O] . Luc Rocher, Julien M. Hendrickx, Yves-Alexandre de Montjoye -1

机译：使用生成模型估计不完整数据集中重新识别的成功
7. Re-identification of Vehicular Location-Based Metadata [O] . Zheng Tan, Cheng Wang, Xiaoling Fu, 2017

机译：重新识别车辆位置的元数据
8. Monte Carlo Computational Modeling of the Energy Dependence of Atomic Oxygen Undercutting of Protected Polymers [R] . Banks, Bruce A., Stueber, Thomas J., Norris, Mary Jo 1998

机译：蒙特卡罗计算机模拟原子氧保护聚合物的能量依赖性

A computational model to protect patient data from location-based re-identification

摘要

著录项

相似文献

相关主题

期刊订阅