...
首页> 外文期刊>Computers & Security >A three-phase approach to differentially private crucial patterns mining over data streams
【24h】

A three-phase approach to differentially private crucial patterns mining over data streams

机译:通过数据流挖掘差分私有关键模式的一种三阶段方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Frequent patterns mining over transactional data streams is an important task for a wide range of online data mining applications. Nevertheless, mining crucial patterns is even more appropriate than frequent patterns over transactional data streams, because crucial patterns are the subset of frequent patterns with the minimum storage cost and information lossless extraction. In this paper, we argue that the privacy of mining crucial patterns from data streams (i.e., aggregating information from individuals) is more likely to be leaked than static scenarios, due to successive releases. However, to the best of our knowledge, there is little work on differential privacy in continuously publishing crucial patterns from data streams. To this end, this paper proposes a real-time differentially private crucial pattern computation algorithm which designs a three-phase mechanism (i.e., the preprocessing phase, the deep-going calculation phase, and the noise-mining phase) at every timestamp. The algorithm is able to not only improve the utility of the crucial pattern statistics as much as possible which satisfy differential privacy, but also reduce the average mining time without incurring high maintenance cost according to the feature of crucial patterns. To reduce the number of calls to crucial pattern computation algorithm, we design two-dissimilarity formulas according to the relationship between frequent patterns and crucial patterns to decide to return either low noisy statistic or accurately approximated statistic in the first two phases. When the low noisy statistic needs to be turned, the algorithm goes into the noise-mining phase. To obtain private crucial patterns, we first filter crucial pattern candidate set by perturbing the scoring functions, and then add independent Laplace noise to their supports. Finally, we conduct extensive experiments on dense datasets and sparse datasets to show the effectiveness and efficiency of our algorithm. (C) 2018 Elsevier Ltd. All rights reserved.
机译:对于各种在线数据挖掘应用程序而言,在事务数据流上进行频繁模式挖掘是一项重要任务。但是,挖掘关键模式比事务数据流上的频繁模式更为合适,因为关键模式是具有最小存储成本和信息无损提取的频繁模式的子集。在本文中,我们认为由于连续的发布,从静态数据场景中挖掘关键模式的隐私(即从个人收集信息)比静态场景更容易泄露。但是,据我们所知,在连续发布数据流中的关键模式方面,关于差异隐私的工作很少。为此,本文提出了一种实时差分私有关键模式计算算法,该算法在每个时间戳上设计了一个三相机制(即预处理阶段,深层计算阶段和噪声挖掘阶段)。该算法不仅可以最大程度地提高满足差异性隐私的关键模式统计的效用,而且可以根据关键模式的特点减少平均挖掘时间,而又不增加维护成本。为了减少对关键模式计算算法的调用次数,我们根据频繁模式与关键模式之间的关系设计了两个相异公式,以决定在前两个阶段返回低噪声统计量或精确近似统计量。当需要降低低噪声统计量时,该算法将进入噪声挖掘阶段。为了获得专用的关键码型,我们首先通过扰动评分函数来过滤关键码型候选集,然后将独立的拉普拉斯噪声添加到它们的支持中。最后,我们在密集数据集和稀疏数据集上进行了广泛的实验,以证明该算法的有效性和效率。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Computers & Security》 |2019年第5期|30-48|共19页
  • 作者单位

    Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

    Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

    Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

    Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

    Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Differential privacy; Crucial patterns; Data streams; Privacy leakage; Data mining;

    机译:差异隐私;关键模式;数据流;隐私泄漏;数据挖掘;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号