...
首页> 外文期刊>Computers, IEEE Transactions on >Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud
【24h】

Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

机译:使用MapReduce的接近感知本地编码匿名化功能,可在云中扩展可扩展的大数据隐私

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a -ancestors clustering (similar to -means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.
机译:云计算提供了有希望的可扩展IT基础架构,以支持医疗保健和商业等领域中各种大数据应用程序的各种处理。在此类应用程序中,像电子健康记录这样的数据集通常包含对隐私敏感的信息,如果该信息被发布或共享给云中的第三方,则可能引起隐私问题。一种用于数据隐私保护的实用且被广泛采用的技术是通过泛化使数据匿名化,以满足给定的隐私模型。但是,大多数现有的针对小型数据集的隐私保护方法在遇到大数据时往往会因其功能不足或可伸缩性较差而无法实现。在本文中,我们调查了针对附近隐私侵犯的大数据匿名化的本地编码问题,并尝试确定该问题的可扩展解决方案。具体而言,我们提出了一种允许隐私值和多个敏感属性在语义上接近的接近度隐私模型,并将本地重新编码问题建模为接近度感知群集问题。为解决上述问题,提出了一种可扩展的两阶段聚类方法,该方法由-祖先聚类(类似于-means)算法和邻近感知聚结聚类算法组成。我们使用MapReduce设计算法,以通过在云中执行数据并行计算来获得高可伸缩性。在现实数据集上进行的大量实验表明,与现有方法相比,我们的方法显着提高了防御邻近隐私漏洞的能力,可伸缩性和本地重新编码匿名化的时间效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号