【24h】

Using Anonymized Data for Classification

机译:使用匿名数据进行分类

获取原文

摘要

In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.
机译:近年来,匿名方法已成为一种重要的工具,可以在发布隐私敏感数据集时保护个人隐私。对匿名化技术的这种兴趣导致了用于在不同的隐私和实用性假设下对数据进行匿名化的过多方法。同时,很少有研究针对如何有效地将匿名数据用于一般数据挖掘,尤其是针对分布式数据挖掘。在本文中,我们通过将匿名数据建模为不确定数据,提出了一种使用匿名数据构建分类器的新方法。在我们的方法中,我们不假定数据上的任何概率分布。相反,我们建议在匿名化过程中收集所有必要的统计信息,并将其与匿名化数据一起发布。我们表明,发布此类统计信息不会违反匿名性。在本地和分布式数据挖掘设置中跨越各种替代方案的实验表明,我们的方法比启发式方法在处理匿名数据方面表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号