首页> 外文期刊>Computers & Security >Differentially private maximal frequent sequence mining
【24h】

Differentially private maximal frequent sequence mining

机译:差分私有最大频繁序列挖掘

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we study the problem of designing a differentially private algorithm for mining maximal frequent sequences, which can not only achieve high data utility and a high degree of privacy, but also provide high time efficiency. To solve this problem, we present a new differentially private algorithm, which is referred to as DP-MFSM. DP-MFSM consists of three phases: pre-processing phase, expected frequent sequence mining (ESM) phase, and candidate extraction and verification (CEV) phase. Specifically, in the pre-processing phase, we first extract some statistical information from the input database, and use the extracted information to determine the values of some variables which will be used in the ESM phase. Then, in the ESM phase, we randomly partition the input database into several sub-databases, and use a partition-based ESM technique to find expected frequent sequences, which are a subset of candidate frequent sequences and more likely to be frequent. At last, in the CEV phase, we extract candidate maximal frequent sequences from the discovered expected frequent sequences, and use a splitting-based technique to verify which candidates are actually frequent in the input database. Through privacy analysis, we show that our DP-MFSM algorithm is ε-differentially private. Extensive experiments on real-world datasets illustrate that our DP-MFSM algorithm can substantially outperform alternative approaches.
机译:本文研究了一种设计最大挖掘频繁序列的差分私有算法的问题,该算法不仅可以实现较高的数据利用率和高度的保密性,而且可以提供较高的时间效率。为了解决这个问题,我们提出了一种新的差分私有算法,称为DP-MFSM。 DP-MFSM包含三个阶段:预处理阶段,预期频繁序列挖掘(ESM)阶段和候选者提取和验证(CEV)阶段。具体来说,在预处理阶段,我们首先从输入数据库中提取一些统计信息,然后使用提取的信息来确定将在ESM阶段中使用的某些变量的值。然后,在ESM阶段,我们将输入数据库随机分为几个子数据库,并使用基于分区的ESM技术来查找预期的频繁序列,该序列是候选频繁序列的子集,并且更有可能是频繁的。最后,在CEV阶段,我们从发现的预期频繁序列中提取候选最大频繁序列,并使用基于拆分的技术来验证输入数据库中哪些候选实际上是频繁的。通过隐私分析,我们表明我们的DP-MFSM算法是ε-差分私有的。在真实数据集上进行的大量实验表明,我们的DP-MFSM算法可以大大胜过其他方法。

著录项

  • 来源
    《Computers & Security》 |2015年第11期|175-192|共18页
  • 作者单位

    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China;

    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China;

    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China;

    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China;

    State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Differential privacy; Maximal frequent sequence mining; Frequent sequence mining; Length reduction; Threshold relaxation;

    机译:差异性隐私;最大频繁序列挖掘;频繁的序列挖掘;长度减少;阈值松弛;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号