【24h】

Applying Deep Learning to Preserve Data Confidentiality Keynote Address

机译:应用深度学习保留数据机密性主题演讲

获取原文

摘要

Preserving data confidentiality is a crucial problem when releasing microdata for public-use. A lot of approaches have been proposed so far for preserving data confidentiality, and many of them are based on traditional probability and statistics which have the capability to mask the original data. However, their performance needs to be significantly improved in practice. In this paper, we approached this problem by using deep learning-based generative model, which can generate simulated data that are closely related to raw data but different for each item. Since the mechanism of generative model is to transform a distribution (like Uniform) sampled from a noise to another distribution (like Gaussian) sampled from a real dataset, it is hard to guarantee such generation that can represent the raw data in practice due to existing statistical variants between them. Despite deep learning's strong generative ability, the same issue still exists. In this study, we innovatively explore statistical similarity between two datasets via deep learning-based generative model. And we also introduced two statistical evaluation metrics to assess the similarity. We conducted extensive experiments to validate our idea with two real-world datasets, the census dataset and the environmental dataset.
机译:在发布微数据供公众使用时,保护数据机密性是一个关键问题。迄今为止,已经提出了许多方法来保护数据的机密性,其中许多方法是基于传统的概率和统计信息的,具有掩盖原始数据的能力。但是,它们的性能在实践中需要大大提高。在本文中,我们通过使用基于深度学习的生成模型来解决此问题,该模型可以生成与原始数据密切相关但每个项目都不同的模拟数据。由于生成模型的机制是将从噪声采样的分布(如均匀)转换为从真实数据集采样的另一分布(如高斯分布),因此由于存在的原因,很难保证这种生成在实际中可以表示原始数据它们之间的统计差异。尽管深度学习具有强大的生成能力,但仍然存在相同的问题。在这项研究中,我们通过基于深度学习的生成模型创新性地探索了两个数据集之间的统计相似性。并且我们还引入了两个统计评估指标来评估相似性。我们进行了广泛的实验,以使用两个实际数据集(普查数据集和环境数据集)验证我们的想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号