首页> 外文学位 >LACE: Supporting Privacy-Preserving Data Sharing in Transfer Defect Learning.
【24h】

LACE: Supporting Privacy-Preserving Data Sharing in Transfer Defect Learning.

机译:LACE:在传输缺陷学习中支持保护隐私的数据共享。

获取原文
获取原文并翻译 | 示例

摘要

Cross Project Defect Prediction (CPDP) is a field of study where an organization lacking enough local data can use data from other organizations or projects for building defect predictors. Research in CPDP has shown challenges in using ``other'' data, therefore transfer defect learning has emerged to improve on the quality of CPDP results. With this new found success in CPDP, it is now increasingly important to focus on the privacy concerns of data owners.;To support CPDP, data must be shared. There are many privacy threats that inhibit data sharing. We focus on sensitive attribute disclosure threats or attacks, where an attacker seeks to associate a record(s) in a data set to its sensitive information. Solutions to this sharing problem comes from the field of Privacy Preserving Data Publishing (PPDP) which has emerged as a means to confuse the efforts of sensitive attribute disclosure attacks and therefore reduce privacy concerns. PPDP covers methods and tools used to disguise raw data for publishing. However, prior work warned that increasing data privacy decreases the efficacy of data mining on privatized data.;The goal of this research is to help encourage organizations and individuals to share their data publicly and/or with each other for research purposes and/or improving the quality of their software product through defect prediction. The contributions of this work allow three benefits for data owners willing to share privatized data: 1) that they are fully aware of the sensitive attribute disclosure risks involved so they can make an informed decision about what to share, 2) they are provided with the ability to privatize their data and have it remain useful, and 3) the ability to work with others to share their data based on what they learn from each others data. We call this private multiparty data sharing.;To achieve these benefits, this dissertation presents LACE (Large-scale Assurance of Confidentiality Environment). LACE incorporates a privacy metric called IPR (Increased Privacy Ratio) which calculates the risk of sensitive attribute disclosure of data through comparing results of queries (attacks) on the original data and a privatized version of that data. LACE also includes a privacy algorithm which uses intelligent instance selection to prune the data to as low as 10% of the original data (thus offering complete privacy to the other 90%). It then mutates the remaining data making it possible that over 70% of sensitive attribute disclosure attacks are unsuccessful. Finally, LACE can facilitate private multiparty data sharing via a unique leader-follower algorithm (developed for this dissertation). The algorithm allows data owners to serially build a privatized data set, by allowing them to only contribute data that are not already in the private cache. In this scenario, each data owner shares even less of their data, some as low as 2%.;The experiments of this thesis, lead to the following conclusion: at least for the defect data studied here, data can be minimized, privatized and shared without a significant degradation in utility. Specifically, in comparative studies with standard privacy models (k-anonymity and data swapping), applied to 10 open-source data sets and 3 proprietary data sets, LACE produces privatized data sets that are significantly smaller than the original data (as low as 2%). As a result LACE offers better protection against sensitive attribute disclosure attacks than other methods.
机译:跨项目缺陷预测(CPDP)是一个研究领域,缺乏足够本地数据的组织可以使用来自其他组织或项目的数据来构建缺陷预测器。 CPDP的研究显示了使用``其他''数据的挑战,因此出现了传输缺陷学习以提高CPDP结果的质量。随着CPDP取得新的成功,现在关注数据所有者的隐私问题变得越来越重要。;要支持CPDP,必须共享数据。有许多隐私威胁会阻止数据共享。我们专注于敏感属性公开威胁或攻击,其中攻击者试图将数据集中的记录与其敏感信息相关联。解决此共享问题的方法来自隐私保护数据发布(PPDP)领域,该领域已成为混淆敏感属性公开攻击的工作并因此减少隐私问题的一种手段。 PPDP涵盖了用于掩盖原始数据以供发布的方法和工具。但是,先前的工作警告说,增加数据隐私权会降低对私有化数据进行数据挖掘的效率。这项研究的目的是帮助鼓励组织和个人出于研究目的和/或改善彼此公开和/或彼此共享数据。通过缺陷预测来评估其软件产品的质量。这项工作的贡献为愿意共享私有数据的数据所有者带来了三点好处:1)他们充分了解所涉及的敏感属性披露风险,因此他们可以就共享内容做出明智的决定,2)为他们提供共享的信息。私有化数据并使其保持有用的能力; 3)与其他人合作以基于他们从彼此的数据中学到的知识共享数据的能力。我们将其称为私有多方数据共享。为了实现这些好处,本文提出了LACE(机密性环境的大规模保证)。 LACE包含一个称为IPR(增加的隐私比率)的隐私度量,该隐私度量通过比较原始数据的查询(攻击)结果和该数据的私有化版本来计算数据敏感属性泄露的风险。 LACE还包括一种隐私算法,该算法使用智能实例选择将数据删减至原始数据的10%(从而为其他90%的用户提供完全的隐私)。然后,它会变异剩余的数据,从而有可能导致超过70%的敏感属性泄露攻击未成功。最后,LACE可以通过独特的领导者跟踪算法(为本文开发)来促进私有多方数据共享。该算法允许数据所有者仅贡献尚未在私有缓存中的数据,从而使数据所有者可以串行构建私有化数据集。在这种情况下,每个数据所有者共享的数据更少,有的甚至低至2%。本论文的实验得出以下结论:至少对于此处研究的缺陷数据,数据可以最小化,私有化和共享,而实用性没有明显下降。具体而言,在采用标准隐私模型(k匿名和数据交换)的比较研究中,LACE应用于10个开源数据集和3个专有数据集,LACE生成的私有化数据集远小于原始数据(低至2 %)。因此,与其他方法相比,LACE提供了更好的保护,以防止敏感属性泄露攻击。

著录项

  • 作者

    Peters, Fayola.;

  • 作者单位

    West Virginia University.;

  • 授予单位 West Virginia University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 162 p.
  • 总页数 162
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号