首页> 外文学位 >Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing.
【24h】

Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing.

机译:隐私保护数据发布中的随机化方法的隐私和效用分析。

获取原文
获取原文并翻译 | 示例

摘要

Randomization has emerged as an important approach for data disguising in Privacy-Preserving Data Publishing (PPDP). Due to different data it is applied to, the randomization approach falls into into two classes: Random Perturbation (RP) for continuous data and Randomized Response (RR) for categorical data. In PPDP, utility is an important metric and referred to the preservation of data mining information, while, as a more important metric, privacy is referred to the preservation of the original information. Privacy can be determined by different aspects, such as attribute correlations, randomization parameters, etc. However, in the aspect of the attribute correlations, no one has studied whether it is a factor affecting privacy and how it affects the privacy preserving property of the randomization; in the aspect of the randomization parameters, no one has investigated how to systematically compare different randomization parameters and what the optimal randomization parameters are so that the disguised data are most privacy-preserved but still useful for data mining computations.;This thesis addresses these problems. First, we identify that a key factor to affect privacy is the correlations among attributes. We propose two data reconstruction methods that are based on continuous attribute correlations. We have analyzed the relationship between data correlations and the amount of private information that can be disclosed based on our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed. To improve privacy, we propose a modified randomization scheme based on the identified factor, the attribute correlations. Our experimental results have shown that, as the improved randomization method is used, the reconstruction accuracy of both reconstruction methods becomes worse, or less private information is disclosed. Second, for RR, we formulate the quantifications of privacy and utility as estimate problems. By using the quantifications to compare different RR schemes, we employ an evolutionary multi-objective optimization method to find optimal randomization parameters of RR. The experimental results have shown that our scheme has a much better performance than the existing RR schemes. Third, for RP, we first formulate an RP technique which is more general than the existing RP technique. After generaling RP technique, we discretize the data range and use a matrix to hold the randomization parameters. We also formulate the quantifications of privacy and utility for the generalized RP technique as estimate problems. Because to measure utility is expensive, we propose an efficient approach to approximate it. According to the privacy and approximate utility metrics, we utilize an evolutionary multi-objective optimization method to find optimal randomization parameters of RP. We show that our scheme to choose the parameters has outperformed the existing scheme.
机译:随机化已成为隐私保护数据发布(PPDP)中数据伪装的重要方法。由于要应用的数据不同,因此随机化方法分为两类:用于连续数据的随机扰动(RP)和用于分类数据的随机响应(RR)。在PPDP中,效用是重要的指标,是指数据挖掘信息的保存,而作为更重要的指标,隐私是指原始信息的保存。可以通过不同方面来确定隐私,例如属性相关性,随机化参数等。但是,在属性相关性方面,没有人研究过它是否是影响隐私的因素以及它如何影响随机化的隐私保留属性。 ;在随机化参数方面,没有人研究如何系统地比较不同的随机化参数以及最佳的随机化参数是什么,从而使伪装后的数据具有最大的隐私保护性,但仍可用于数据挖掘计算。 。首先,我们确定影响隐私的关键因素是属性之间的相关性。我们提出了两种基于连续属性相关性的数据重建方法。我们已经分析了数据相关性与可以基于我们提出的数据重构方案公开的私人信息量之间的关系。我们的研究表明,当相关性较高时,可以更准确地重建原始数据,即可以公开更多的私人信息。为了提高隐私性,我们提出了一种基于已识别因素,属性相关性的改进随机方案。我们的实验结果表明,随着使用改进的随机方法,两种重建方法的重建精度都会变差,或者公开的私人信息更少。其次,对于RR,我们将隐私和效用的量化公式化为估计问题。通过使用量化来比较不同的RR方案,我们采用了一种进化的多目标优化方法来找到RR的最佳随机化参数。实验结果表明,我们的方案具有比现有RR方案更好的性能。第三,对于RP,我们首先制定一种RP技术,该技术比现有的RP技术更为通用。在推广了RP技术之后,我们离散化了数据范围,并使用矩阵来保存随机化参数。我们还将公式化的广义RP技术的隐私和实用性量化作为估计问题。由于效用的度量很昂贵,因此我们提出了一种有效的方法来对其进行近似。根据隐私和近似效用指标,我们利用进化的多目标优化方法来找到RP的最优随机参数。我们表明,我们选择参数的方案优于现有方案。

著录项

  • 作者

    Huang, Zhengli.;

  • 作者单位

    Syracuse University.;

  • 授予单位 Syracuse University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:38:58

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号