【24h】

Data Anonymization for Privacy Aware Machine Learning

机译:隐私意识机器学习的数据匿名化

获取原文

摘要

The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.
机译:在过去的几年里,数据泄漏,攻击和其他赎金的增加已经指出了对数据安全和隐私的担忧。所有这些都对数据的共享和出版产生负面影响。为了解决这些限制,需要进行创新的技术来保护数据。特别是,在基于机器学习的数据模型中使用时。在这种情况下,差异隐私是保护隐私最有效的方法之一。但是,差异隐私应用的范围非常有限(例如,数值和结构化数据)。因此,在本研究中,我们的目标是调查应用于文本数据和时间序列的差异隐私的行为。通过比较基于两个基于组件分析的差异隐私算法来评估所提出的方法。通过将三种机器学习模型应用于匿名和主要数据来证明效力。在机密性,效用,可扩展性和计算效率方面,他们的表演彻底评估。 PPCA方法以牺牲高耗时的牺牲品提供高透明质量,而DPCA方法保留更多的实用性和更快的时间计算。我们展示了将神经网络文本表示方法与差异隐私方法结合起来的可能性。我们还强调,在卫星传感器中匿名互联的实际测量数据,对于异常检测任务。我们认为,我们的研究将大大激励使用差异隐私技术,这可能导致更多的数据共享和隐私保留。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号