【24h】

Using deep learning to preserve data confidentiality

机译:利用深度学习保留数据机密性

获取原文
获取原文并翻译 | 示例
           

摘要

Preserving data confidentiality is crucial when releasing microdata for public-use. There are a variety of proposed approaches; many of them are based on traditional probability theory and statistics. These approaches mainly focus on masking the original data. In practice, these masking techniques, despite covering part of the data, risk leaving sensitive data open to release. In this paper, we approach this problem using a deep learning-based generative model which generates simulation data to mask the original data. Generating simulation data that holds the same statistical characteristics as the raw data becomes the key idea and also the main challenge in this study. In particular, we explore the statistical similarities between the raw data and the generated data, given that the generated data and raw data are not obviously distinguishable. Two statistical evaluation metrics, Absolute Relative Residual Values and Hellinger Distance, are the evaluation methods we have decided upon to evaluate our results. We also conduct extensive experiments to validate our idea with two real-world datasets: the Census Dataset and the Environmental Dataset.
机译:保留数据机密性在释放微数据以供公共使用时至关重要。有各种提出的方​​法;其中许多是基于传统概率理论和统计数据。这些方法主要关注掩盖原始数据。在实践中,尽管覆盖了数据的部分,但这些掩蔽技术,风险留出敏感数据以释放。在本文中,我们使用基于深入的学习的生成模型来解决这个问题,该模型生成模拟数据以掩盖原始数据。生成具有与原始数据相同的统计特征的仿真数据成为本研究中的主要思想以及主要挑战。特别是,考虑到所生成的数据和原始数据没有明显可区分,我们探讨了原始数据和生成数据之间的统计相似性。两个统计评估指标,绝对相对残差值和Hellinger距离,是我们决定评估我们的结果的评估方法。我们还开展了广泛的实验,以验证我们的想法与两个现实世界数据集:人口普查数据集和环境数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号