首页> 外文期刊>BMC Medical Informatics and Decision Making >Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets
【24h】

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

机译:在大型医疗数据集上使用加密的长期密钥和多位树评估隐私保护记录链接

获取原文
       

摘要

Background Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Methods Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Results Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. Conclusions We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed.
机译:背景技术通过记录链接使用来自不同来源的数据库来集成医学数据是一种在医学研究中日益广泛使用的强大技术。在许多管辖区中,链接记录所需的唯一个人标识符不可用。由于必须改用敏感属性(例如名称),因此隐私法规通常要求对这些标识符进行加密。用于保护隐私的记录链接(PPRL)的相应技术集已受到广泛关注。最近的一种方法是基于布隆过滤器。由于具有出色的抵御密码攻击能力,因此复合Bloom过滤器(密码长期密钥,CLK)被认为是PPRL中隐私的最佳实践。到目前为止,这些技术在大规模数据中的实际性能尚不清楚。方法利用澳大利亚医院入院数据的较大子集,我们针对源自明文概率记录链接的金标准,测试了创新的PPRL技术(使用多位树的CLK)的性能。评估链接时间和链接质量(召回率,精度和F量度)。结果与CLK相比,纯文本概率链接导致精度和召回率略高。 PPRL需要更多的计算时间,但仍有500万条记录在一天之内可以重复删除。但是,PPRL方法需要对参数进行微调。结论我们认为提高PPRL的隐私性的代价是精度和查全率的损失很小,而计算负担和设置时间却大大增加。这些成本在大多数应用环境中似乎是可以接受的,但是在决定应用PPRL时必须考虑这些成本。需要对参数的最佳自动选择进行进一步的研究。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号