Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

En Tzu Wang; Arbee L. P. Chen

首页> 外文期刊>Data Mining and Knowledge Discovery >Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

【24h】

Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

机译：通过持续保持全局概要在分布式数据流上挖掘频繁项集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Mining frequent itemsets over data streams has attracted much research attention in recent years. In the past, we had developed a hash-based approach for mining frequent itemsets over a single data stream. In this paper, we extend that approach to mine global frequent itemsets from a collection of data streams distributed at distinct remote sites. To speed up the mining process, we make the first attempt to address a new problem on continuously maintaining a global synopsis for the union of all the distributed streams. The mining results therefore can be yielded on demand by directly processing the maintained global synopsis. Instead of collecting and processing all the data in a central server, which may waste the computation resources of remote sites, distributed computations over the data streams are performed. A distributed computation framework is proposed in this paper, including two communication strategies and one merging operation. These communication strategies are designed according to an accuracy guarantee of the mining results, determining when and what the remote sites should transmit to the central server (named coordinator). On the other hand, the merging operation is exploited to merge the information received from the remote sites into the global synopsis maintained at the coordinator. By the strategies and operation, the goal of continuously maintaining the global synopsis can be achieved. Rooted in the continuously maintained global synopsis, we propose a mining algorithm for finding global frequent itemsets. Moreover, the correctness guarantees of the communication strategies and merging operation, and the accuracy guarantee analysis of the mining algorithm are provided. Finally, a series of experiments on synthetic datasets and a real dataset are performed to show the effectiveness and efficiency of the distributed computation framework.

机译：近年来，通过数据流挖掘频繁项集引起了很多研究关注。过去，我们开发了一种基于哈希的方法来在单个数据流上挖掘频繁的项目集。在本文中，我们将这种方法扩展为从分布在不同远程站点的数据流集合中挖掘全球频繁项目集。为了加快挖掘过程，我们首次尝试解决一个新问题，即不断维护所有分布式流的联合的全局提要。因此，可以通过直接处理维护的全局大纲来按需获得挖掘结果。代替在中央服务器中收集和处理所有数据（这可能浪费远程站点的计算资源），而是执行数据流上的分布式计算。提出了一种分布式计算框架，包括两种通信策略和一种合并操作。这些通信策略是根据挖掘结果的准确性保证而设计的，确定了远程站点何时以及向远程服务器（称为协调器）传输什么内容。另一方面，利用合并操作将从远程站点接收的信息合并到在协调器中维护的全局概要中。通过策略和操作，可以实现持续保持全局概要的目标。植根于持续保持的全局概要中，我们提出了一种用于查找全局频繁项集的挖掘算法。此外，还提供了通信策略和合并操作的正确性保证，以及挖掘算法的准确性保证分析。最后，对合成数据集和真实数据集进行了一系列实验，以证明分布式计算框架的有效性和效率。

著录项

来源
《Data Mining and Knowledge Discovery》 |2011年第2期|p.252-299|共48页
作者
En Tzu Wang; Arbee L. P. Chen;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis [J] . Wang E.T., Chen A.L.P. Data mining and knowledge discovery . 2011,第2期

机译：通过持续保持全局概要在分布式数据流上挖掘频繁项集
2. EFFICIENT SUBSET-LATTICE ALGORITHMS FOR MINING CLOSED FREQUENT ITEMSETS AND MAXIMAL FREQUENT ITEMSETS IN DATA STREAMS [J] . Ye-In Chang, Chia-En Li, Wei-Hau Peng, International Journal of Electrical Engineering: Transactions of the Chinese Institute of Engineers, Series E . 2013,第2期

机译：高效的子格算法，用于挖掘数据流中的封闭频率项和最大频率项
3. Mining frequent items and itemsets from distributed data streams for emergency detection and management [J] . Altomare Albino, Cesario Eugenio, Talia Domenico Journal of ambient intelligence and humanized computing . 2017,第1期

机译：从分布式数据流中挖掘频繁的项目和项目集，以进行紧急检测和管理
4. Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors [C] . Ming-Yen Lin, Sue-Chen Hsueh, Sheng-Kun Hwang Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD 2006); 20060409-12; Singapore(SG) . 2006

机译：使用提要向量在数据流上频繁项目集的可变支持挖掘
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data [O] . Otis Smart, Lauren Burrell -1

机译：遗传程序设计和频繁项集挖掘以识别iEEG和fMRI癫痫数据的特征选择模式
7. Continuous Prediction of Closed Frequent Itemsets from High speed Distributed Data Streams using Parallel Mining on Manifold Windows with Varying Size [O] . V. SiddaReddy, T.V. Rao, A.Govardhan A.Govardhan 2014

机译：使用平行挖掘在具有变化尺寸的歧管窗口上的高速分布数据流中的闭合频繁项目集的连续预测

Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅