Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

Adrian P. Brown; Christian Borgs; Sean M. Randall; Rainer Schnell

首页> 外文期刊>BMC Medical Informatics and Decision Making >Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

【24h】

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

机译：在大型医疗数据集上使用加密的长期密钥和多位树评估隐私保护记录链接

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Methods Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Results Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. Conclusions We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed.

机译：背景技术通过记录链接使用来自不同来源的数据库来集成医学数据是一种在医学研究中日益广泛使用的强大技术。在许多管辖区中，链接记录所需的唯一个人标识符不可用。由于必须改用敏感属性（例如名称），因此隐私法规通常要求对这些标识符进行加密。用于保护隐私的记录链接（PPRL）的相应技术集已受到广泛关注。最近的一种方法是基于布隆过滤器。由于具有出色的抵御密码攻击能力，因此复合Bloom过滤器（密码长期密钥，CLK）被认为是PPRL中隐私的最佳实践。到目前为止，这些技术在大规模数据中的实际性能尚不清楚。方法利用澳大利亚医院入院数据的较大子集，我们针对源自明文概率记录链接的金标准，测试了创新的PPRL技术（使用多位树的CLK）的性能。评估链接时间和链接质量（召回率，精度和F量度）。结果与CLK相比，纯文本概率链接导致精度和召回率略高。 PPRL需要更多的计算时间，但仍有500万条记录在一天之内可以重复删除。但是，PPRL方法需要对参数进行微调。结论我们认为提高PPRL的隐私性的代价是精度和查全率的损失很小，而计算负担和设置时间却大大增加。这些成本在大多数应用环境中似乎是可以接受的，但是在决定应用PPRL时必须考虑这些成本。需要对参数的最佳自动选择进行进一步的研究。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2017年第1期|共页
作者
Adrian P. Brown; Christian Borgs; Sean M. Randall; Rainer Schnell;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
专利

1. High quality linkage using Multibit Trees for privacy-preserving blocking [J] . Adrian Brown, Christian Borgs, Sean Randall, International Journal of Population Data Science . 2017,第1期

机译：使用多比特树进行高质量链接以保护隐私
2. BRAZILIAN HEALTHCARE RECORD LINKAGE (BRHC-RLK) - A RECORD LINKAGE METHODOLOGY FOR BRAZILIAN MEDICAL CLAIMS DATASETS (DATASUS) [J] . Campos D. F., Rosim R. P., Duva A. S., Value in health: the journal of the International Society for Pharmacoeconomics and Outcomes Research . 2017,第5期

机译：巴西医疗保健记录联动（BRHC-RLK） - 巴西医疗索赔数据集（DataSus）的记录联系方式
3. Revisiting distance-based record linkage for privacy-preserving release of statistical datasets [J] . Herranz Javier, Nin Jordi, Rodriguez Pablo, Data & Knowledge Engineering . 2015,第NOVaPTaA期

机译：重新探究基于距离的记录链接，以保护统计数据集的隐私发布
4. Towards Privacy-Preserving Record Linkage with Record-Wise Linkage Policy [C] . Takahito Kaiho, Wen-jie Lu, Toshiyuki Amagasa, International conference on database and expert systems applications;International workshop on advanced ICT technologies for secure societies;International workshop on big data mamagement in cloud systems;International workshop on biological knowledge discovery;International workshop on technologies for information retrieval;International workshop on uncertainty in cloud computing . 2017

机译：迈向具有记录明智链接政策的隐私保护记录链接
5. Informing, evaluating and automating the record linkage process for reliably combining disparate datasets. [D] . DuVall, Scott Leroy. 2010

机译：通知，评估和自动化记录链接过程，以可靠地组合不同的数据集。
6. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets [O] . Adrian P. Brown, Christian Borgs, Sean M. Randall, 2017

机译：在大型医疗数据集上使用加密的长期密钥和多位树评估隐私保护记录链接
7. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets [O] . Adrian P. Brown, Christian Borgs, Sean M. Randall, 2017

机译：在大型医疗数据集上使用加密长期键和多点树进行评估隐私记录联系

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

摘要

著录项

相似文献

相关主题

期刊订阅