The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing

机译：隐私费用：在匿名数据发布中销毁数据挖掘实用程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while e-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but kc-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records. For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, e-diversity, and similar methods based on generalization and suppression.

机译：重新识别是对包含个人记录的公共数据集的主要隐私威胁。许多隐私保护算法依赖于泛化和抑制“准标识符”属性，例如邮政编码和出生。他们的目标通常是句法消毒：例如，k-匿名需要每个“准识别符”元组在至少k记录中出现，而e-分集要求每个准识别仪的敏感属性的分布具有高熵。消毒数据的效用也在句法上进行测量，通过应用的泛化步骤的数量或具有相同准标识符的记录数。在本文中，我们询问了准标识符的泛化和抑制是否为琐碎的消毒提供了任何益处，这简单地将准标识符与敏感属性分开。以前的工作表明，K-Anonymous数据库可以对数据挖掘有用，但KC-匿名化不保证任何隐私。相比之下，我们衡量隐私之间的权衡（对手可以从消毒记录中学到多少？）和实用程序，以在同一消毒记录上执行的数据挖掘算法的准确性来测量。对于我们的实验评估，我们使用UCI机器学习存储库的相同数据集，以前用于泛化和抑制的研究。我们的结果表明，即使是适度的隐私收益也需要几乎完全销毁数据采矿实用程序。在大多数情况下，琐碎的待遇提供了相同的实用性和比K-Anymony，E-多样性和基于泛化和抑制的类似方法的更好的隐私。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining》|2008年||共9页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
algorithms; security;

机译：算法;安全;

相似文献

外文文献
中文文献
专利

1. Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第8期

机译：属性易感性和基于熵的数据匿名，以提高用户社区隐私和公用事业在发布数据中
2. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [J] . Abdul Majeed, Farman Ullah, Sungchang Lee Sensors . 2017,第5期

机译：个人身份识别信息的漏洞和多样性感知匿名化，以提高用户隐私和发布数据的实用性
3. IMPROVED K-ANONYMIZE AND L-DIVERSE APPROACH FOR PRIVACY PRESERVING BIG DATA PUBLISHING USING MPSEC DATASET [J] . Jain Priyank, Gyanchandani Manasi, Khare Nilay Computing and informatics . 2020,第3期

机译：改进了k-anymonize和l-不同的方法，用于使用mpsec数据集保留大数据发布的隐私权
4. The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing [C] . Justin Brickell, Vitaly Shmatikov ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 . 2008

机译：隐私权的代价：匿名数据发布中数据挖掘实用程序的破坏
5. Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing. [D] . Huang, Zhengli. 2008

机译：隐私保护数据发布中的随机化方法的隐私和效用分析。
6. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [O] . Abdul Majeed, Farman Ullah, Sungchang Lee 2017

机译：个人身份识别信息的漏洞和多样性感知匿名化以提高用户隐私和发布数据的实用性
7. Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data [O] . Abdul Majeed, Sungchang Lee 2020

机译：属性易感性和基于熵的数据匿名，以提高用户社区隐私和公用事业在发布数据中

The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing

摘要

著录项

相似文献

相关主题

期刊订阅