Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets

机译：在动态数据集上并行和分布式频繁的常规项目集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional methods for data mining typically make the assumption that data is centralized and static. This assumption is no longer tenable. Such methods waste computational and I/O resources when the data is dynamic, and they impose excessive communication overhead when the data is distributed. As a result, the knowledge discovery process is harmed by slow response times. Ef-ficient implementation of incremental data mining ideas in distributed computing environments is thus becoming crucial for ensuring scalability and facilitating knowledge discovery when data is dynamic and distributed. In this paper we ad-dress this issue in the context of frequent itemset mining, an important data mining task. Frequent itemsets are most often used to generate correlations and associ-ation rules, but more recently they have been used in such far-reaching domains as bio-informatics and e-commerce applications. We first present an efficient al-gorithm which dynamically maintains the required information in the presence of data updates without examining the entire dataset. We then show how to par-allelize the incremental algorithm, so that it can asynchronously mine frequent itemsets. We also propose a distributed algorithm, which imposes low communi-cation overhead for mining distributed datasets. Several experiments confirm that our algorithm results in excellent execution time improvements.

机译：用于数据挖掘的传统方法通常会假设数据集中和静态。这个假设不再是宗旨。这些方法在数据是动态时浪费计算和I / O资源，并且当数据分布时，它们会强加过度的通信开销。结果，知识发现过程受到缓慢的响应时间。在分布式计算环境中的增量数据挖掘思想的EF-有效地实现了对确保可扩展性并促进当数据动态和分布时的知识发现的至关重要。在本文中，我们在频繁的项目集中挖掘这个问题，这是一个重要的数据挖掘任务。频繁的项目集通常用于生成相关性和关联规则，但最近他们已被用于如生物信息学和电子商务应用程序的这种远程域。我们首先介绍一个有效的AL-Gorithm，它在没有检查整个数据集的情况下在数据更新的情况下动态地维护所需信息。然后，我们展示了如何对增量算法进行分析，使其可以异步地挖掘频繁的项目集。我们还提出了一种分布式算法，其对挖掘分布式数据集来说施加低通信开销。几个实验证实，我们的算法导致出色的执行时间改进。

著录项

来源
《International Conference on High Performance Computing》|2003年||共10页
会议地点
作者
Adriano Veloso; Matthew Eric Otey; Srinivasan Parthasarathy; Wagner Meira Jr.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词

相似文献

外文文献
中文文献
专利

1. An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets [J] . Vanahalli Manjunath K., Patil Nagamma Information Sciences: An International Journal . 2019,第期

机译：一种有效的并行行枚举算法，用于从高维数据集频繁频繁的巨大闭合项集
2. Algorithms for mining frequent itemsets in static and dynamic datasets [J] . R. Hernandez-Leon, J. Hernandez-Palancar, Jesus A. Carrasco-Ochoa, Intelligent data analysis . 2010,第3期

机译：静态和动态数据集中频繁项目集的挖掘算法
3. Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset [J] . Manjunath K Vanahalli, Nagamma Patil Journal of Parallel and Distributed Computing . 2020,第Octa期

机译：分布式负载平衡频繁的高维数据集频繁巨大闭合项目集挖掘算法
4. Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets [C] . Adriano Veloso, Matthew Eric Otey, Srinivasan Parthasarathy, International Conference on High Performance Computing . 2003

机译：在动态数据集上并行和分布式频繁的常规项目集
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques [O] . Trung Nghia Vu, Aida Mrzic, Dirk Valkenborg, 2014

机译：利用频繁项集挖掘技术揭示未分配质谱峰之间的关联
7. Parallel and distributed frequent itemset mining on dynamic datasets [O] . Adriano Veloso, Matthew Erick Otey, Srinivasan Parthasarathy, 2003

机译：动态数据集上的并行和分布式频繁项集挖掘

Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets

摘要

著录项

相似文献

相关主题

期刊订阅