首页> 外文学位 >Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing.

【24h】

Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing.

机译：隐私保护数据发布中的随机化方法的隐私和效用分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Randomization has emerged as an important approach for data disguising in Privacy-Preserving Data Publishing (PPDP). Due to different data it is applied to, the randomization approach falls into into two classes: Random Perturbation (RP) for continuous data and Randomized Response (RR) for categorical data. In PPDP, utility is an important metric and referred to the preservation of data mining information, while, as a more important metric, privacy is referred to the preservation of the original information. Privacy can be determined by different aspects, such as attribute correlations, randomization parameters, etc. However, in the aspect of the attribute correlations, no one has studied whether it is a factor affecting privacy and how it affects the privacy preserving property of the randomization; in the aspect of the randomization parameters, no one has investigated how to systematically compare different randomization parameters and what the optimal randomization parameters are so that the disguised data are most privacy-preserved but still useful for data mining computations.;This thesis addresses these problems. First, we identify that a key factor to affect privacy is the correlations among attributes. We propose two data reconstruction methods that are based on continuous attribute correlations. We have analyzed the relationship between data correlations and the amount of private information that can be disclosed based on our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed. To improve privacy, we propose a modified randomization scheme based on the identified factor, the attribute correlations. Our experimental results have shown that, as the improved randomization method is used, the reconstruction accuracy of both reconstruction methods becomes worse, or less private information is disclosed. Second, for RR, we formulate the quantifications of privacy and utility as estimate problems. By using the quantifications to compare different RR schemes, we employ an evolutionary multi-objective optimization method to find optimal randomization parameters of RR. The experimental results have shown that our scheme has a much better performance than the existing RR schemes. Third, for RP, we first formulate an RP technique which is more general than the existing RP technique. After generaling RP technique, we discretize the data range and use a matrix to hold the randomization parameters. We also formulate the quantifications of privacy and utility for the generalized RP technique as estimate problems. Because to measure utility is expensive, we propose an efficient approach to approximate it. According to the privacy and approximate utility metrics, we utilize an evolutionary multi-objective optimization method to find optimal randomization parameters of RP. We show that our scheme to choose the parameters has outperformed the existing scheme.

机译：随机化已成为隐私保护数据发布（PPDP）中数据伪装的重要方法。由于要应用的数据不同，因此随机化方法分为两类：用于连续数据的随机扰动（RP）和用于分类数据的随机响应（RR）。在PPDP中，效用是重要的指标，是指数据挖掘信息的保存，而作为更重要的指标，隐私是指原始信息的保存。可以通过不同方面来确定隐私，例如属性相关性，随机化参数等。但是，在属性相关性方面，没有人研究过它是否是影响隐私的因素以及它如何影响随机化的隐私保留属性。 ;在随机化参数方面，没有人研究如何系统地比较不同的随机化参数以及最佳的随机化参数是什么，从而使伪装后的数据具有最大的隐私保护性，但仍可用于数据挖掘计算。。首先，我们确定影响隐私的关键因素是属性之间的相关性。我们提出了两种基于连续属性相关性的数据重建方法。我们已经分析了数据相关性与可以基于我们提出的数据重构方案公开的私人信息量之间的关系。我们的研究表明，当相关性较高时，可以更准确地重建原始数据，即可以公开更多的私人信息。为了提高隐私性，我们提出了一种基于已识别因素，属性相关性的改进随机方案。我们的实验结果表明，随着使用改进的随机方法，两种重建方法的重建精度都会变差，或者公开的私人信息更少。其次，对于RR，我们将隐私和效用的量化公式化为估计问题。通过使用量化来比较不同的RR方案，我们采用了一种进化的多目标优化方法来找到RR的最佳随机化参数。实验结果表明，我们的方案具有比现有RR方案更好的性能。第三，对于RP，我们首先制定一种RP技术，该技术比现有的RP技术更为通用。在推广了RP技术之后，我们离散化了数据范围，并使用矩阵来保存随机化参数。我们还将公式化的广义RP技术的隐私和实用性量化作为估计问题。由于效用的度量很昂贵，因此我们提出了一种有效的方法来对其进行近似。根据隐私和近似效用指标，我们利用进化的多目标优化方法来找到RP的最优随机参数。我们表明，我们选择参数的方案优于现有方案。

著录项

作者
Huang, Zhengli.;
展开▼
作者单位

Syracuse University.;

展开▼
授予单位 Syracuse University.;
学科 Computer Science.
学位 Ph.D.
年度 2008
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:38:58

相似文献

外文文献
中文文献
专利

1. Preliminary Data Analysis in Healthcare Multicentric Data Mining: a Privacy-preserving Distributed Approach [J] . Andrea Damiani, Carlotta Masciocchi, Luca Boldrini, Je-LKS . 2018,第1期

机译：医疗保健多中心数据挖掘的初步数据分析：一种隐私保留分布式方法
2. Privacy-Preserving Classification Rule Mining for Balancing Data Utility and Knowledge Privacy Using Adapted Binary Firefly Algorithm [J] . G. Kalyani, M. V. P. Chandra Sekhara Rao, B. Janakiramaiah Arabian Journal for Science and Engineering . 2018,第8期

机译：使用自适应二进制萤火虫算法平衡数据实用程序和知识隐私的隐私保护分类规则挖掘
3. A privacy-preserving approach for multimodal transaction data integrated analysis [J] . Sui Peipei, Li Xianxian Neurocomputing . 2017,第Auga30期

机译：一种用于多模式交易数据集成分析的隐私保护方法
4. Privacy-Preserving Data Publishing in the Cloud: A Multi-level Utility Controlled Approach [C] . Palanisamy Balaji, Ling Liu 2015 IEEE 8th International Conference on Cloud Computing . 2015

机译：在云中保护隐私的数据发布：多级实用程序控制方法
5. A Generic Privacy Quantification Framework for Privacy-Preserving Data Publishing. [D] . Zhu, Zutao. 2010

机译：用于保护隐私的数据发布的通用隐私量化框架。
6. Privacy-preserving biomedical data dissemination via a hybrid approach [O] . Yichen Jiang, Chenghong Wang, Zhixuan Wu, 2018

机译：通过混合方法传播保护隐私的生物医学数据
7. Privacy-Preserving Data Publishing in the Cloud: A Multi-level Utility Controlled Approach [O] . Palanisamy B, Liu L 2015

机译：在云中保护隐私的数据发布：多级实用程序控制方法

Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing.

摘要

著录项

相似文献

相关主题

期刊订阅