Data Anonymization for Privacy Aware Machine Learning

机译：隐私意识机器学习的数据匿名化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.

机译：在过去的几年里，数据泄漏，攻击和其他赎金的增加已经指出了对数据安全和隐私的担忧。所有这些都对数据的共享和出版产生负面影响。为了解决这些限制，需要进行创新的技术来保护数据。特别是，在基于机器学习的数据模型中使用时。在这种情况下，差异隐私是保护隐私最有效的方法之一。但是，差异隐私应用的范围非常有限（例如，数值和结构化数据）。因此，在本研究中，我们的目标是调查应用于文本数据和时间序列的差异隐私的行为。通过比较基于两个基于组件分析的差异隐私算法来评估所提出的方法。通过将三种机器学习模型应用于匿名和主要数据来证明效力。在机密性，效用，可扩展性和计算效率方面，他们的表演彻底评估。 PPCA方法以牺牲高耗时的牺牲品提供高透明质量，而DPCA方法保留更多的实用性和更快的时间计算。我们展示了将神经网络文本表示方法与差异隐私方法结合起来的可能性。我们还强调，在卫星传感器中匿名互联的实际测量数据，对于异常检测任务。我们认为，我们的研究将大大激励使用差异隐私技术，这可能导致更多的数据共享和隐私保留。

著录项

来源
《International Conference on Machine Learning, Optimization, and Data Science》|2019年|772p|共13页
会议地点
作者
David Nizar Jaidan; Maxime Carrere; Zakaria Chemli; Remi Poisvert;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词
Privacy; Anonymization; Machine learning; Text encoding; Natural language processing; Time series; Anomaly detection;

机译：隐私;匿名化;机器学习;文本编码;自然语言处理;时间序列;异常检测;

相似文献

外文文献
中文文献
专利

1. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [J] . Abdul Majeed, Farman Ullah, Sungchang Lee Sensors . 2017,第5期

机译：个人身份识别信息的漏洞和多样性感知匿名化，以提高用户隐私和发布数据的实用性
2. Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud [J] . Zhang Xuyun, Dou Wanchun, Pei Jian, Computers, IEEE Transactions on . 2015,第8期

机译：使用MapReduce的接近感知本地编码匿名化功能，可在云中扩展可扩展的大数据隐私
3. Federated Learning and Privacy:Building privacy-preserving systems for machine learning and data science on decentralized data [J] . allista Bonawitz, Peter Kairouz, Brendan McMahan, ACM Queue: Architecting Tomorrow s Computing . 2021,第5期

机译：联邦学习和隐私：在分散数据上构建机器学习和数据科学的隐私保存系统
4. Data Anonymization for Privacy Aware Machine Learning [C] . David Nizar Jaidan, Maxime Carrere, Zakaria Chemli, International conference on machine learning, optimization, and data science . 2019

机译：隐私感知机器学习的数据匿名化
5. Privacy-Preserving Machine Learning via Data Compression & Differential Privacy [D] . Chanyaswad, Theerachai. 2018

机译：通过数据压缩和差异隐私保护隐私的机器学习
6. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [O] . Abdul Majeed, Farman Ullah, Sungchang Lee 2017

机译：个人身份识别信息的漏洞和多样性感知匿名化以提高用户隐私和发布数据的实用性
7. Availability, Reliability, and Security in Information Systems : IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2016, and Workshop on Privacy Aware Machine Learning for Health Data Science, PAML 2016, Salzburg, Austria, August 31 - September 2, 2016, Proceedings [O] . Buccafurri, Francesco, Holzinger, Andreas, Kieseberg, Peter, 2016

机译：信息系统的可用性，可靠性和安全性：IFIP WG 8.4、8.9，TC 5国际跨域会议，CD-ARES 2016和健康数据科学的隐私感知机器学习研讨会，2016年PAML，奥地利萨尔茨堡，8月31日-2016年9月2日，会议记录

Data Anonymization for Privacy Aware Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅