首页> 美国卫生研究院文献>other >Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion
【2h】

Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion

机译:满足概率k-匿名性准则的披露限制的本地综合

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation.
机译:在发布包含有关个人的敏感信息的数据库之前,数据发布者必须向他们应用统计信息披露限制(SDL)方法,以避免对任何可识别的数据主体公开敏感信息。 SDL方法通常包括对原始数据记录进行掩盖或合成,以最大程度地减少泄露敏感信息的风险,同时为数据用户提供有关感兴趣人群的准确信息。在本文中,我们基于数据本地合成的思想提出了一种公开限制的新方案。我们的方法基于基于模型的聚类。所提出的方法满足k匿名性的要求。特别是,我们通过结合对集群基数的约束,使用了k-匿名隐私模型的一种变体,即概率k-匿名。关于数据实用程序,对于连续属性,我们精确地保留原始数据的均值和协方差,同时近似保留子域(由聚类和聚类组合定义)的高阶矩和分析。对于连续数据和分类数据,我们对医学数据集的实验表明,从数据效用的角度来看,局部合成与其他披露限制方法(包括用于合成数据生成的顺序回归方法)相比非常有利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号