Achieving anonymity via clustering

机译：通过聚类实现匿名

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code [15]. Sweeney [16] proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a pre-specified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-Anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant-factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k. We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an ε fraction of points to remain unclustered, i.e., deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clusteringscenarios as well.

机译：在保持个人隐私的同时，从包含个人记录的表中发布数据进行分析是当今日益重要的问题。取消记录识别的传统方法是删除诸如社会安全号码，姓名等识别字段。但是，最近的研究表明，可以使用非关键属性（称为准标识符）来识别美国人口的很大一部分。例如出生日期，性别和邮政编码[15]。 Sweeney [16]提出了用于隐私的 k -匿名模型，其中泄漏信息的非关键属性被抑制或泛化，因此对于修改表中的每个记录，至少有 k个 −1其他记录的准标识符值完全相同。我们提出了一种匿名化数据记录的新方法，其中首先对数据记录的准标识符进行聚类，然后发布聚类中心。为了确保数据记录的私密性，我们强加了每个群集必须包含不少于预定数量的数据记录的约束。由于我们对集群中心的选择要比 k -Anonymity大得多，因此该技术更为通用。在许多情况下，它使我们可以发布更多的信息而又不会损害隐私。我们还提供了恒定因子近似算法来提出此类聚类。这是用于匿名化问题的第一组算法，其中性能独立于匿名性参数 k 。我们进一步观察到，一些离群点会大大增加匿名化的成本。因此，我们扩展了算法以允许点的ε部分保持未聚类，即从匿名出版物中删除。因此，通过不释放数据库记录的一小部分，我们可以确保发布用于分析的数据具有较小的失真，因此更加有用。我们针对新聚类目标的近似算法具有独立的意义，并且可能适用于其他聚类场景。 展开▼

著录项

来源
《ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems》|2006年|P.153-162|共10页

会议地点

作者
Gagan Aggarwal; Tomas Feder; Krishnaram Kenthapadi; Samir Khuller; Rina Panigrahy; Dilys Thomas; An Zhu; PGagan Aggarwal; PTomas Feder; PKrishnaram Kenthapadi; PSamir Khuller; PRina Panigrahy; PDilys Thomas;
展开▼

作者单位

展开▼

会议组织

原文格式 PDF

正文语种

中图分类 TP311.13;

关键词
privacy;

机译：隐私;

相似文献

外文文献

中文文献

专利

1. Achieving anonymity via clustering [J] . Aggarwal G., Feder T., Kenthapadi K., ACM transactions on algorithms . 2010,第3期

机译：通过聚类实现匿名

2. Achieving Personalized k-Anonymity-Based Content Privacy for Autonomous Vehicles in CPS [J] . Wang Jinbao, Cai Zhipeng, Yu Jiguo IEEE transactions on industrial informatics . 2020,第6期

机译：为CPS中的自治车辆实现个性化K-匿名基于内容隐私

3. PAT: A precise reward scheme achieving anonymity and traceability for crowdcomputing in public clouds [J] . Huaqun Wang, Debiao He, Yanfei Sun, Future generation computer systems . 2018,第PTa1期

机译：PAT：一种精确的奖励方案，可实现匿名和可追溯性，以实现公共云中的人群计算

4. Achieving k-Anonymity Via a Density-Based Clustering Method [C] . Hua Zhu, Xiaojun Ye Asia-Pacific Web Conference(APWeb 2007); International Conference on Web-Age Information Management(WAIM 2007); 20070616-18; 20070616-18; Huand Shan(CN); Huand Shan(CN) . 2007

机译：通过基于密度的聚类方法实现k-匿名性

5. Achieving guaranteed anonymity in time-series location data. [D] . Hoh, Baik. 2008

机译：在时间序列位置数据中实现保证的匿名性。

6. Efficient and Anonymous Two-Factor User Authentication in WirelessSensor Networks: Achieving User Anonymity with Lightweight SensorComputation [O] . Junghyun Nam, Kim-Kwang Raymond Choo, Sangchul Han, -1

机译：无线中的高效匿名两要素用户身份验证传感器网络：使用轻型传感器实现用户匿名计算方式

7. Achieving anonymity via clustering [O] . Gagan Aggrawal, Samir Khuller, Tomás Feder, 2013

机译：通过聚类实现匿名

1. 一种遗传算法实现的图聚类匿名隐私保护方法 [J] . 姜火文 ,曾国荪 ,胡克坤 . 计算机研究与发展 . 2016,第010期

2. 面向社交网络数据的等差数列聚类匿名算法 [J] . 刘振鹏 ,董姝慧 ,李泽园 . 郑州大学学报（理学版） . 2022,第001期

3. 基于约束聚类的k-匿名隐私保护方法 [J] . 吴梦婷 ,孙丽萍 ,刘援军 . 计算机工程与设计 . 2021,第003期

4. 一种基于变长聚类的个性化匿名保护方法 [J] . 李丹 ,凌捷 . 计算机工程与应用 . 2018,第008期

5. 面向表数据发布隐私保护的贪心聚类匿名方法 [J] . 姜火文 ,曾国荪 ,马海英 . 软件学报 . 2017,第002期

6. 基于高效多属性再聚类的匿名算法 [C] . 李宁 ,朱青 . 2011年中国计算机学会服务计算学术会议(CCF NCSC2011) . 2011

7. 基于聚类的匿名位置隐私保护算法 [A] . 乔琴琴 . 2017

1. 匿名度可选的匿名消息交换系统及其实现方法 [P] . 中国专利： CN104125142B . 2017.08.11

2. 匿名度可选的匿名消息交换系统及其实现方法 [P] . 中国专利： CN104125142A . 2014-10-29

3. Achieving buyer-seller anonymity for unsophisticated users under collusion amongst intermediaries [P] . 外国专利： US6873977B1 . 2005-03-29

机译：在中介机构之间的合谋下，为不老练的用户实现买卖双方的匿名性

4. ANONYMITY ID GENERATING DEVICE, ANONYMITY ID DECODING DEVICE, ANONYMITY ID GENERATION METHOD, AND ANONYMITY ID GENERATING PROGRAM [P] . 外国专利： JP2015226265A . 2015-12-14

机译：匿名ID生成设备，匿名ID解码设备，匿名ID生成方法，以及匿名ID生成程序

5. ANONYMITY CREDIT INFORMATION IMPARTING SYSTEM, ANONYMITY CREDIT INFORMATION GENERATING DEVICE, CREDIT INFORMATION VERIFYING DEVICE, ANONYMITY INFORMATION VERIFYING DEVICE, ANONYMITY CREDIT INFORMATION RECEIVING DEVICE, AND PROGRAM [P] . 外国专利： JP2006078886A . 2006-03-23

机译：匿名信用信息发布系统，匿名信用信息生成设备，信用信息验证设备，匿名信息验证设备，匿名信用信息接收设备和程序

相关主题

Achieving anonymity via clustering

摘要

著录项

相似文献

相关主题

期刊订阅