The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing

机译：隐私权的代价：匿名数据发布中数据挖掘实用程序的破坏

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while e-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier.In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but kc-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records.For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, e-diversity, and similar methods based on generalization and suppression.

机译：重新标识是对包含单个记录的公共数据集的主要隐私威胁。许多隐私保护算法都依赖于“准标识符”属性（例如邮政编码和生日）的概括和抑制。他们的目标通常是句法消毒：例如，k匿名性要求每个“准标识符”元组至少出现在k条记录中，而电子多样性要求每个准标识符的敏感属性的分布具有较高的熵。还可以通过应用概括步骤的数量或具有相同准标识符的记录的数量，从句法上对清理过的数据的效用进行度量。在本文中，我们询问准标识符的泛化和抑制是否比简单的清理（将区分的标识符与敏感属性分开）带来更多的好处。先前的工作表明，k匿名数据库可用于数据挖掘，但kc匿名化不能保证任何隐私。相比之下，我们衡量了隐私（对手可以从清理后的记录中学到多少？）和效用之间的权衡，衡量为对相同清理后的记录执行的数据挖掘算法的准确性。对于我们的实验评估，我们使用与先前关于泛化和抑制研究相同的UCI机器学习存储库中的数据集。我们的结果表明，即使是适度的隐私保护，也几乎需要彻底破坏数据挖掘实用程序。在大多数情况下，简单卫生处理比k-匿名性，电子多样性和基于泛化和抑制的类似方法可提供同等的实用性和更好的隐私性。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008》|2008年|61-69|共9页
会议地点
作者
Justin Brickell; Vitaly Shmatikov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
algorithms; security;

机译：算法;安全性;

相似文献

外文文献
中文文献
专利

1. Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第8期

机译：属性易感性和基于熵的数据匿名，以提高用户社区隐私和公用事业在发布数据中
2. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [J] . Abdul Majeed, Farman Ullah, Sungchang Lee Sensors . 2017,第5期

机译：个人身份识别信息的漏洞和多样性感知匿名化，以提高用户隐私和发布数据的实用性
3. IMPROVED K-ANONYMIZE AND L-DIVERSE APPROACH FOR PRIVACY PRESERVING BIG DATA PUBLISHING USING MPSEC DATASET [J] . Jain Priyank, Gyanchandani Manasi, Khare Nilay Computing and informatics . 2020,第3期

机译：改进了k-anymonize和l-不同的方法，用于使用mpsec数据集保留大数据发布的隐私权
4. The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing [C] . ACMKDD International Conference on Knowledge Discovery and Data Mining . 2008

机译：隐私费用：在匿名数据发布中销毁数据挖掘实用程序
5. Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing. [D] . Huang, Zhengli. 2008

机译：隐私保护数据发布中的随机化方法的隐私和效用分析。
6. Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data [O] . Abdul Majeed, Farman Ullah, Sungchang Lee 2017

机译：个人身份识别信息的漏洞和多样性感知匿名化以提高用户隐私和发布数据的实用性
7. Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data [O] . Abdul Majeed, Sungchang Lee 2020

机译：属性易感性和基于熵的数据匿名，以提高用户社区隐私和公用事业在发布数据中

The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing

摘要

著录项

相似文献

相关主题

期刊订阅