首页> 美国卫生研究院文献>other >Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning
【2h】

Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

机译:通过基于采样的候选修剪进行差分私有频繁序列挖掘

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy.
机译:在本文中,我们研究了在严格的差分隐私模型下挖掘频繁序列的问题。我们探索了设计差分私有频繁序列挖掘(FSM)算法的可能性,该算法既可以实现高数据实用性又可以实现高度隐私。我们发现,在差分私有FSM中,所需的噪声量与候选序列的数量成正比。如果我们可以有效地减少没有希望的候选序列的数量,那么实用性和隐私权衡就可以得到显着改善。为此,通过利用基于采样的候选修剪技术,我们提出了一种新颖的差分私有FSM算法,称为PFS 2 。我们算法的核心是利用样本数据库进一步修剪基于向下封闭属性生成的候选序列。特别是,我们使用样本数据库中候选序列的嘈杂局部支持来估计哪些序列可能频繁出现。为了提高这种私人估计的准确性,提出了一种序列收缩方法来对样本数据库施加长度约束。此外,为了减少不频繁地误估计频繁序列的可能性,提出了一种阈值松弛方法来松弛用户指定的样本数据库阈值。通过正式的隐私分析,我们表明我们的PFS 2 算法是ε-差分私有的。在真实数据集上进行的大量实验表明,我们的PFS 2 算法可以私密地找到频率很高的序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号