A three-phase approach to differentially private crucial patterns mining over data streams

Wang Jinyan; Liu Chen; Fu Xingcheng; Luo Xudong; Li Xianxian

首页> 外文期刊>Computers & Security >A three-phase approach to differentially private crucial patterns mining over data streams

【24h】

A three-phase approach to differentially private crucial patterns mining over data streams

机译：通过数据流挖掘差分私有关键模式的一种三阶段方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent patterns mining over transactional data streams is an important task for a wide range of online data mining applications. Nevertheless, mining crucial patterns is even more appropriate than frequent patterns over transactional data streams, because crucial patterns are the subset of frequent patterns with the minimum storage cost and information lossless extraction. In this paper, we argue that the privacy of mining crucial patterns from data streams (i.e., aggregating information from individuals) is more likely to be leaked than static scenarios, due to successive releases. However, to the best of our knowledge, there is little work on differential privacy in continuously publishing crucial patterns from data streams. To this end, this paper proposes a real-time differentially private crucial pattern computation algorithm which designs a three-phase mechanism (i.e., the preprocessing phase, the deep-going calculation phase, and the noise-mining phase) at every timestamp. The algorithm is able to not only improve the utility of the crucial pattern statistics as much as possible which satisfy differential privacy, but also reduce the average mining time without incurring high maintenance cost according to the feature of crucial patterns. To reduce the number of calls to crucial pattern computation algorithm, we design two-dissimilarity formulas according to the relationship between frequent patterns and crucial patterns to decide to return either low noisy statistic or accurately approximated statistic in the first two phases. When the low noisy statistic needs to be turned, the algorithm goes into the noise-mining phase. To obtain private crucial patterns, we first filter crucial pattern candidate set by perturbing the scoring functions, and then add independent Laplace noise to their supports. Finally, we conduct extensive experiments on dense datasets and sparse datasets to show the effectiveness and efficiency of our algorithm. (C) 2018 Elsevier Ltd. All rights reserved.

机译：对于各种在线数据挖掘应用程序而言，在事务数据流上进行频繁模式挖掘是一项重要任务。但是，挖掘关键模式比事务数据流上的频繁模式更为合适，因为关键模式是具有最小存储成本和信息无损提取的频繁模式的子集。在本文中，我们认为由于连续的发布，从静态数据场景中挖掘关键模式的隐私（即从个人收集信息）比静态场景更容易泄露。但是，据我们所知，在连续发布数据流中的关键模式方面，关于差异隐私的工作很少。为此，本文提出了一种实时差分私有关键模式计算算法，该算法在每个时间戳上设计了一个三相机制（即预处理阶段，深层计算阶段和噪声挖掘阶段）。该算法不仅可以最大程度地提高满足差异性隐私的关键模式统计的效用，而且可以根据关键模式的特点减少平均挖掘时间，而又不增加维护成本。为了减少对关键模式计算算法的调用次数，我们根据频繁模式与关键模式之间的关系设计了两个相异公式，以决定在前两个阶段返回低噪声统计量或精确近似统计量。当需要降低低噪声统计量时，该算法将进入噪声挖掘阶段。为了获得专用的关键码型，我们首先通过扰动评分函数来过滤关键码型候选集，然后将独立的拉普拉斯噪声添加到它们的支持中。最后，我们在密集数据集和稀疏数据集上进行了广泛的实验，以证明该算法的有效性和效率。（C）2018 Elsevier Ltd.保留所有权利。

著录项

来源
《Computers & Security》 |2019年第5期|30-48|共19页
作者
Wang Jinyan; Liu Chen; Fu Xingcheng; Luo Xudong; Li Xianxian;
展开▼
作者单位

Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin, Peoples R China|Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Guilin, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Differential privacy; Crucial patterns; Data streams; Privacy leakage; Data mining;

机译：差异隐私;关键模式;数据流;隐私泄漏;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach [J] . Karim Md. Rezaul, Cochez Michael, Beyan Oya Deniz, Information Sciences: An International Journal . 2018,第期

机译：在交易数据库和动态数据流中采集最大频繁模式：基于火花的方法
2. Mining sequential patterns from data streams: a centroid approach [J] . Alice Marascu, Florent Masseglia Journal of Intelligent Information Systems . 2006,第3期

机译：从数据流中挖掘顺序模式：质心方法
3. An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining [J] . Cai Saihua, Sun Ruizhi, Hao Shangbo, Communications, China . 2019,第10期

机译：基于最小稀有模式挖掘的加权数据流的高效异常检测方法
4. Fast Lossless Frequent Itemset Mining in Data Streams using Crucial Patterns [C] . Ariyam Das, Carlo Zaniolo SIAM International Conference on Data Mining . 2016

机译：使用关键模式在数据流中的快速无损频繁罚款
5. Mining frequent sequential patterns in data streams using SSM-algorithm. [D] . Monwar, Mostafa. 2005

机译：使用SSM算法在数据流中挖掘频繁的顺序模式。
6. Hyper-structure mining of frequent patterns in uncertain data streams [O] . Chandima HewaNadungodage, Yuni Xia, Jaehwan John Lee, -1

机译：不确定数据流中频繁模式的超结构挖掘
7. Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach [O] . Md. Rezaul Karim, Michael Cochez, Oya Deniz Beyan, 2018

机译：在交易数据库和动态数据流中采集最大频繁模式：基于火花的方法
8. Terrorist Activity Evaluation and Pattern Detection (TAE&PD) in Afghanistan: A Knowledge Discovery and Data Mining (KDDM) Approach for Counter-Terrorism. [R] . Pou, J., Duffany, J., Cruz, A. 2012

机译：阿富汗的恐怖活动评估和模式检测（TaE和pD）：反恐怖主义的知识发现和数据挖掘（KDDm）方法。

A three-phase approach to differentially private crucial patterns mining over data streams

摘要

著录项

相似文献

相关主题

期刊订阅