【24h】

Privacy-Preserving Record Linkage with Spark

机译:Spark的隐私保护记录链接

获取原文

摘要

Privacy considerations obligate careful and secure processing of personal data. This is especially true when personal data is linked against databases from other organizations. During such endeavours, privacy-preserving record linkage (PPRL) can be utilized to prevent needless exposure of sensitive information to other organizations. With the increase of personal data that is being gathered and analyzed, scalable PPRL capable of handling massive databases is much desired. In this work, we evaluate Apache Spark as an option to scale PPRL. Not only is it valuable to have a scalable PPRL implementation, but one based on the Spark would also be commonly deployable and could take advantage of further development of the ecosystem. Our results show that a PPRL solution based on Spark outperforms alternatives when it comes to handling multiple millions of records; can scale to dozens of nodes; and is on-par with regular record linkage implementations in terms of achieved results.
机译:出于隐私方面的考虑,必须谨慎,安全地处理个人数据。当个人数据与其他组织的数据库链接时,尤其如此。在这种努力中,可以使用隐私保护记录链接(PPRL)来防止敏感信息不必要地暴露给其他组织。随着收集和分析的个人数据的增加,人们迫切希望能够处理大型数据库的可扩展PPRL。在这项工作中,我们将Apache Spark评估为可扩展PPRL的选项。拥有可扩展的PPRL实现不仅很有价值,而且基于Spark的实现也可以普遍部署,并且可以利用生态系统的进一步发展。我们的结果表明,在处理数百万条记录时,基于Spark的PPRL解决方案优于其他方法。可以扩展到数十个节点;并且在取得的成果方面与常规的记录链接实施相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号