首页> 外文学位 >Perturbation based privacy preserving data mining techniques for real-world data.
【24h】

Perturbation based privacy preserving data mining techniques for real-world data.

机译:基于扰动的隐私保护数据挖掘技术,用于真实世界的数据。

获取原文
获取原文并翻译 | 示例

摘要

The Perturbation method has been extensively studied for privacy preserving data mining. In this method, random noise from a known distribution is added to the privacy sensitive data before the data is sent to the data miner. Subsequently, the data miner reconstructs an approximation to the original data distribution from the perturbed data and uses the reconstructed distribution for data mining purposes. Due to the addition of noise, loss of information versus preservation of privacy is always a trade off in the perturbation-based approaches. The question is, to what extent are the users willing to compromise their privacy? This is a choice that changes from individual to individual. Different individuals may have different attitudes towards privacy based on customs and cultures. Unfortunately, current perturbation based privacy preserving data mining techniques do not allow the individuals to choose their desired privacy levels. This is a drawback as privacy is a personal choice. In this dissertation, we propose an individually adaptable perturbation model, which enables the individuals to choose their own privacy levels. The effectiveness of our new approach is demonstrated by various experiments conducted on both synthetic and real-world data sets.;Reconstruction of original distribution has be questioned for potential privacy breaches. After investigating the reconstruction step in detail, we also question the applicability of this approach deal with the real-word data. In this dissertation, we propose a new perturbation based technique. In our solution, instead of rebuilding the original data distribution, we modify the data mining algorithms so that they can be directly used on the perturbed data. In other words, we directly build a classifier for the original data set from the perturbed training data set. Our approach is especially suitable for the scenarios where the reconstruction of the original data distribution may not be successful, due to the limited amount of training data.
机译:扰动方法已被广泛研究用于隐私保护数据挖掘。在这种方法中,来自已知分布的随机噪声会在将数据发送到数据挖掘器之前添加到隐私敏感数据中。随后,数据挖掘者从被扰动的数据中重建出原始数据分布的近似值,并将重建后的分布用于数据挖掘目的。由于增加了噪音,信息丢失与隐私保护始终是基于扰动的方法之间的权衡。问题是,用户愿意在多大程度上损害其隐私?这是一个因人而异的选择。根据习俗和文化,不同的个人对隐私可能有不同的态度。不幸的是,当前基于扰动的隐私保护数据挖掘技术不允许个人选择他们想要的隐私级别。这是一个缺点,因为隐私是个人选择。本文提出了一种个体适应性的扰动模型,该模型可以使个体选择自己的隐私级别。我们对合成数据集和现实世界数据集进行的各种实验证明了我们新方法的有效性。对原始分发的重构存在潜在的侵​​犯隐私的行为提出质疑。在详细研究了重建步骤之后,我们还质疑这种方法处理实词数据的适用性。本文提出了一种新的基于摄动的技术。在我们的解决方案中,我们无需重建原始数据分布,而是修改数据挖掘算法,以便可以将它们直接用于受干扰的数据。换句话说,我们直接从扰动的训练数据集中为原始数据集建立分类器。我们的方法尤其适用于由于训练数据量有限而无法成功重建原始数据分布的情况。

著录项

  • 作者

    Liu, Li.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 113 p.
  • 总页数 113
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号