首页> 外文期刊>BMC Medical Informatics and Decision Making >Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation
【24h】

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

机译:安全和可扩展的水平分区健康数据的重复数据删除,以进行隐私保留分布式统计计算

获取原文

摘要

Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N ??2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45?s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.
机译:已经开发了背景技术来计算分布式数据集的统计数据,而不会显示除统计结果之外的私人信息。但是,分布式数据集中的重复记录可能导致统计结果不正确。因此,为了提高分布式数据集的统计分析的准确性,安全的重复数据删除是一个重要的预处理步骤。方法为具有确定性记录链接算法的水平分区数据集的重复数据删除设计了安全协议。我们在半诚实对手存在下提供了对该议定书的正式安全分析。该协议在位于挪威的三个微生物学实验室实施和部署,我们在数据集上运行了实验,其中每个实验室的记录数量变化。还对通过局域网连接的模拟微生物数据集和数据保管人进行了实验。结果安全分析表明,该协议在半诚实的对抗模式下保护个人和数据保管人的隐私。更确切地说,协议仍然是安全的,勾结了N ?? 2损坏的数据保管人。协议的总运行时随着数据保管人和记录的添加线性缩放。分布在20个数据托管人的一百万个模拟记录中重复删除45秒内。实验结果表明,该协议比以前的同一问题的协议更有效和可扩展。结论拟议的重复数据删除方案对于实际用途是有效和可扩展的,同时保护患者和数据保管人的隐私。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号