首页> 美国卫生研究院文献>other >Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

【2h】

Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

机译：通过基于采样的候选修剪进行差分私有频繁序列挖掘

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS². The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS² algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS² algorithm can privately find frequent sequences with high accuracy.

机译：在本文中，我们研究了在严格的差分隐私模型下挖掘频繁序列的问题。我们探索了设计差分私有频繁序列挖掘（FSM）算法的可能性，该算法既可以实现高数据实用性又可以实现高度隐私。我们发现，在差分私有FSM中，所需的噪声量与候选序列的数量成正比。如果我们可以有效地减少没有希望的候选序列的数量，那么实用性和隐私权衡就可以得到显着改善。为此，通过利用基于采样的候选修剪技术，我们提出了一种新颖的差分私有FSM算法，称为PFS ^{2 。我们算法的核心是利用样本数据库进一步修剪基于向下封闭属性生成的候选序列。特别是，我们使用样本数据库中候选序列的嘈杂局部支持来估计哪些序列可能频繁出现。为了提高这种私人估计的准确性，提出了一种序列收缩方法来对样本数据库施加长度约束。此外，为了减少不频繁地误估计频繁序列的可能性，提出了一种阈值松弛方法来松弛用户指定的样本数据库阈值。通过正式的隐私分析，我们表明我们的PFS ^{2 算法是ε-差分私有的。在真实数据集上进行的大量实验表明，我们的PFS ^{2 算法可以私密地找到频率很高的序列。}}}

著录项

期刊名称 other
作者
Shengzhi Xu; Sen Su; Xiang Cheng; Zhengyi Li; Li Xiong;
展开▼
作者单位

展开▼
年(卷),期 -1(2015),-1
年度 -1
页码 1035–1046
总页数 36
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Differentially Private Frequent Sequence Mining [J] . Shengzhi Xu, Xiang Cheng, Sen Su, IEEE Transactions on Knowledge and Data Engineering . 2016,第11期

机译：差分私有频繁序列挖掘
2. Differentially private maximal frequent sequence mining [J] . Xiang Cheng, Sen Su, Shengzhi Xu, Computers & Security . 2015,第NOVa期

机译：差分私有最大频繁序列挖掘
3. FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy [J] . Bac Le, Hai Duong, Tin Truong, Knowledge and information systems . 2017,第1期

机译：FCLOSM，FGENSM：使用局部修剪策略进行频繁闭合和发电机序列的两种有效算法
4. Differentially private frequent sequence mining via sampling-based candidate pruning [C] . Shengzhi Xu, Sen Su, Xiang Cheng, IEEE international conference on data engineering . 2015

机译：通过基于采样的候选修剪进行差分私有频繁序列挖掘
5. Mining Frequent Sequences in One Database Scan Using Distributed Computers. [D] . Brajczuk, Dale Allan. 2011

机译：使用分布式计算机在一次数据库扫描中挖掘频繁序列。
6. Differentially Private Frequent Subgraph Mining [O] . Shengzhi Xu, Sen Su, Li Xiong, -1

机译：差分私有频繁子图挖掘
7. Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning [O] . Shengzhi Xu, Sen Su, Xiang Cheng, 2016

机译：基于抽样的候选剪枝差分私有频繁序列挖掘

Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅