首页> 外文期刊>Journal of grid computing >A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams
【24h】

A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

机译:一种用于从分布式数据流中挖掘频繁项目和项目集的多域体系结构

获取原文
获取原文并翻译 | 示例
           

摘要

Real-time analysis of distributed data streams is a challenging task since it requires scalable solutions to handle streams of data that are generated very rapidly by multiple sources. This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and itemsets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent itemsets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of three domains each one handling a data stream.
机译:分布式数据流的实时分析是一项具有挑战性的任务,因为它需要可伸缩的解决方案来处理由多个源非常快速地生成的数据流。本文介绍了在分布式环境中分析数据流的体系结构的设计和实现。特别地,已经执行数据流分析以用于计算超过频率阈值的项目和项目集。挖掘方法是混合的,即,使用草图算法通过单遍计算频繁项,而通过进一步的多遍分析计算频繁项集。该体系结构结合了并行处理和分布式处理,以跟上分布式数据流的速率。为了使计算接近数据,将矿工分布在生成数据流的域之间。该论文报告了使用该架构原型获得的实验结果,并在由三个域组成的Grid上进行了测试,每个域都处理一个数据流。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号