首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Sorted Nearest Neighborhood Clustering for Efficient Private Blocking
【24h】

Sorted Nearest Neighborhood Clustering for Efficient Private Blocking

机译:排序最近的邻域聚类以实现有效的私有阻止

获取原文

摘要

Record linkage is an emerging research area which is required by various real-world applications to identify which records in different data sources refer to the same real-world entities. Often privacy concerns and restrictions prevent the use of traditional record linkage applications across different organizations. Linking records in situations where no private or confidential information can be revealed is known as privacy-preserving record linkage (PPRL). As with traditional record linkage applications, scalability is a main challenge in PPRL. This challenge is generally addressed by employing a blocking technique that aims to reduce the number of candidate record pairs by removing record pairs that likely refer to non-matches without comparing them in detail. This paper presents an efficient private blocking technique based on a sorted neighborhood approach that combines k-anonymous clustering and the use of public reference values. An empirical study conducted on real-world databases shows that this approach is scalable to large databases, and that it can provide effective blocking while preserving k-anonymous characteristics. The proposed approach can be up-to two orders of magnitude faster than two state-of-the-art private blocking techniques, k-nearest neighbor clustering and Hamming based locality sensitive hashing.
机译:记录链接是一个新兴的研究领域,各种现实应用程序都需要使用该记录来识别不同数据源中的哪些记录引用了相同的现实世界实体。通常,隐私问题和限制会阻止在不同组织之间使用传统的记录链接应用程序。在无法泄露私人或机密信息的情况下,链接记录称为隐私保护记录链接(PPRL)。与传统的记录链接应用程序一样,可伸缩性是PPRL中的主要挑战。通常通过采用一种阻塞技术来解决该挑战,该技术旨在通过删除可能涉及不匹配的记录对而无需详细比较它们,从而减少候选记录对的数量。本文提出了一种有效的私有阻止技术,该技术基于结合了k-匿名聚类和公共参考值使用的排序邻域方法。在现实世界的数据库上进行的一项经验研究表明,该方法可扩展到大型数据库,并且可以在保留k匿名特征的同时提供有效的阻止。与两种最先进的私有阻塞技术(k最近邻聚类和基于Hamming的局部敏感哈希)相比,所提出的方法可以快两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号