首页> 外文期刊>Knowledge and Information Systems >A hybrid multi-group approach for privacy-preserving data mining
【24h】

A hybrid multi-group approach for privacy-preserving data mining

机译:隐私保护数据挖掘的混合多组方法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.
机译:在本文中,我们提出了一种用于隐私保护数据挖掘的混合多组方法。我们在本文中做出了两点贡献。首先,我们提出一种混合方法。先前的工作使用了随机方法或安全多方计算(SMC)方法。但是,这两种方法具有互补的特征:随机方法效率更高,但准确性更低;而SMC方法效率更低,但准确性更高。我们提出了一种新颖的混合方法,该方法利用了两种方法的优势来平衡精度和效率约束。与现有的两种方法相比,我们提出的方法比随机方法具有更高的精度,并且比SMC方法具有更低的计算成本。我们还提出了一种多组方案,使数据挖掘者可以灵活地控制数据挖掘准确性和隐私之间的平衡。该方案是受以下事实激励的:当维数很多时,在单个属性级别对数据进行随机化的现有随机化方案可能会产生不足的准确性。我们将属性划分为组,并开发一种方案来进行基于组的随机化,以实现更好的数据挖掘准确性。为了证明所提出的通用方案的有效性,我们将其用于ID3决策树算法和关联规则挖掘问题,并给出了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号