首页> 外文会议>IEEE International Conference on Trust, Security and Privacy in Computing and Communications >Privacy-Encoding Models for Preserving Utility of Machine Learning Algorithms in Social Media
【24h】

Privacy-Encoding Models for Preserving Utility of Machine Learning Algorithms in Social Media

机译:用于社交媒体机器学习算法效用的隐私编码模型

获取原文

摘要

Social media has become a vital platform in our daily life, where users can interact with their friends and other people throughout the world. The vast data generated by these platforms is unique in its variety and sensitivity, and although it potentially has significant utility, but also the potential for misuse. Although social media providers apply some existing privacy techniques, such as encryption and anonymization, the techniques cannot achieve a solid level of data privacy while maintaining the highest level of data utility. This paper proposes new Privacy-Encoding (PE) models that contain two-levels of data privacy: 1) data perturbation-based encoding techniques, and 2) data normalization-based scaling techniques. The data perturbation-based encoding techniques involve label encoder and one-hot encoder ones, while data normalization-based scaling techniques include min-max and z-score normalization ones. The aim of the two-levels is to transform original data into perturbed data, along with balancing the high level of data utility using machine learning algorithms. To evaluate the data utility, the proposed models are applied on the adult dataset as well as a simulated social media dataset and the accuracy of the results is compared with several machine learning algorithms. The experiment results reveal that the models could achieve high privacy and utility levels in terms of variance, accuracy and f-measure metrics.
机译:社交媒体已成为我们日常生活中的一个重要平台,用户可以与他们的朋友和世界各地的其他人互动。这些平台产生的庞大数据在其种类和灵敏度方面是独一无二的,尽管它可能具有重要的效用,但也具有滥用的可能性。虽然社交媒体提供者应用一些现有的隐私技术,例如加密和匿名化,但是在保持最高级别的数据实用程序的同时,技术无法实现数据隐私的实体级别。本文提出了新的隐私编码(PE)模型,其包含两级数据隐私:1)基于数据扰动的编码技术,以及基于数据归一化的缩放技术。基于数据的数据扰动的编码技术涉及标签编码器和一个热编码器,而基于数据归一化的缩放技术包括MIN-MAX和Z分数标准化。两级的目的是将原始数据转换为扰动数据,以及使用机器学习算法平衡高水平的数据实用程序。为了评估数据实用程序,所提出的模型应用于成人数据集以及模拟的社交媒体数据集,并将结果的准确性与几种机器学习算法进行比较。实验结果表明,该模型可以在方差,准确性和F测量指标方面实现高隐私和实用水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号