首页> 外文期刊>Pattern Analysis and Applications >On utilizing dependence-based information to enhance micro-aggregation for secure statistical databases
【24h】

On utilizing dependence-based information to enhance micro-aggregation for secure statistical databases

机译:关于利用基于依赖性的信息来增强安全统计数据库的微观汇总

获取原文
获取原文并翻译 | 示例
           

摘要

We consider the micro-aggregation problem which involves partitioning a set of individual records in a micro-data file into a number of mutually exclusive and exhaustive groups. This problem, which seeks for the best partition of the micro-data file, is known to be NP-hard, and has been tackled using many heuristic solutions. In this paper, we would like to demonstrate that in the process of developing micro-aggregation techniques (MATs), it is expedient to incorporate information about the dependence between the random variables in the micro-data file. This can be achieved by pre-processing the micro-data before invoking any MAT, in order to extract the useful dependence information from the joint probability distribution of the variables in the micro-data file, and then accomplishing the micro-aggregation on the “maximally independent” variables—thus confirming the conjecture [A conjecture, which was recently proposed by Domingo-Ferrer et al. (IEEE Trans Knowl Data Eng 14(1):189–201, 2002), was that the phenomenon of micro-aggregation can be enhanced by incorporating dependence-based information between the random variables of the micro-data file by working with (i.e., selecting) the maximally independent variables. Domingo-Ferrer et al. have proposed to select one variable from among the set of highly correlated variables inferred via the correlation matrix of the micro-data file. In this paper, we demonstrate that this process can be automated, and that it is advantageous to select the “most independent variables” by using methods distinct from those involving the correlation matrix.] of Domingo-Ferrer et al. Our results, on real life and artificial data sets, show that including such information will enhance the process of determining how many variables are to be used, and which of them should be used in the micro-aggregation process.
机译:我们考虑了微聚合问题,该问题涉及将微数据文件中的一组单个记录划分为多个互斥且详尽的组。寻求对微数据文件进行最佳分区的问题已知为NP难题,并且已使用许多启发式解决方案进行了解决。在本文中,我们想证明,在开发微聚合技术(MAT)的过程中,最好将有关随机变量之间的相关性的信息并入微数据文件中。这可以通过在调用任何MAT之前对微数据进行预处理,以便从微数据文件中变量的联合概率分布中提取有用的依赖信息,然后在“最大独立”变量,从而证实了这一推测[Domingo-Ferrer等人最近提出的一个推测。 (IEEE Trans Knowl Data Eng 14(1):189–201,2002),是通过与微数据文件的随机变量之间合并基于依存关系的信息,可以增强微聚集现象,即(选择)最大独立变量。 Domingo-Ferrer等。已经建议从通过微数据文件的相关矩阵推断出的一组高度相关的变量中选择一个变量。在本文中,我们证明了该过程可以自动化,并且通过使用与涉及相关矩阵的方法不同的方法来选择“最独立变量”是有利的。] Domingo-Ferrer等人。我们在现实生活和人工数据集上的结果表明,包含此类信息将增强确定使用多少变量以及在微观聚合过程中应使用哪些变量的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号