首页> 外文会议>International Conference on Data Engineering >P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model
【24h】

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

机译:P3GM:私有高维数据通过隐私保存分阶段生成模型发布

获取原文

摘要

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately handle high dimensional data. In particular, when the input dataset contains a large number of features, the existing techniques require injecting a prohibitive amount of noise to satisfy differential privacy, which results in the outsourced data analysis meaningless. To address the above issue, this paper proposes privacy-preserving phased generative model (P3GM), which is a differentially private generative model for releasing such sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency (e.g., easy to converge). We give theoretical analyses about the learning complexity and privacy loss in P3GM. We further experimentally evaluate our proposed method and demonstrate that P3GM significantly outperforms existing solutions. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity. Besides, in several data mining tasks with synthesized data, our model outperforms the competitors in terms of accuracy.
机译:我们如何在减轻隐私风险时释放大量的敏感数据?保护数据合成使数据持有者能够将分析任务外包给不受信任的第三方。此问题的最先进的方法是在差异隐私下建立一个生成模式,提供严谨的隐私保障。但是,现有方法不能充分处理高维数据。特别地,当输入数据集包含大量特征时,现有技术需要注入禁止的噪声量以满足差异隐私,这导致外包数据分析毫无意义。为了解决上述问题,本文提出了隐私保留的相位生成模型(P3GM),这是一种差异私有的生成模型,用于释放这种敏感数据。 P3GM采用两阶段学习过程,使其对噪声稳健,并提高学习效率(例如,易于收敛)。我们对P3GM的学习复杂性和隐私损失提供理论分析。我们进一步通过实验评估我们所提出的方法,并证明P3GM显着优于现有的解决方案。与最先进的方法相比,我们所生成的样本在数据分集中看起来较少,越近于原始数据。此外,在具有合成数据的几个数据挖掘任务中,我们的模型在准确性方面优于竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号