Parallel Associative Classification Data Mining Frameworks Based MapReduce

Fadi Thabtah; Suhel Hammoud; Hussein Abdel-Jaber

首页> 外文期刊>Parallel Processing Letters >Parallel Associative Classification Data Mining Frameworks Based MapReduce

【24h】

Parallel Associative Classification Data Mining Frameworks Based MapReduce

机译：基于MapReduce的并行关联分类数据挖掘框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Associative classification (AC) is a research topic that integrates association rules with classification in data mining to build classifiers. After dissemination of the Classification-based Association Rule algorithm (CBA), the majority of its successors have been developed to improve either CBA's prediction accuracy or the search for frequent ruleitems in the rule discovery step. Both of these steps require high demands in processing time and memory especially in cases of large training data sets or a low minimum support threshold value. In this paper, we overcome the problem of mining large training data sets by proposing a new learning method that repeatedly transforms data between line and item spaces to quickly discover frequent ruleitems, generate rules, subsequently rank and prune rules. This new learning method has been implemented in a parallel Map-Reduce (MR) algorithm called MRMCAR which can be considered the first parallel AC algorithm in the literature. The new learning method can be utilised in the different steps within any AC or association rule mining algorithms which scales well if contrasted with current horizontal or vertical methods. Two versions of the learning method (Weka, Hadoop) have been implemented and a number of experiments against different data sets have been conducted. The ground bases of the comparisons are classification accuracy and time required by the algorithm for data initialization, frequent ruleitems discovery, rule generation and rule pruning. The results reveal that MRMCAR is superior to both current AC mining algorithms and rule based classification algorithms in improving the classification performance with respect to accuracy.

机译：关联分类（AC）是将数据挖掘中的关联规则与分类集成在一起以构建分类器的研究主题。传播了基于分类的关联规则算法（CBA）之后，其大多数后继产品已被开发出来，以提高CBA的预测准确性或在规则发现步骤中搜索频繁的规则项。这两个步骤都对处理时间和内存有很高的要求，尤其是在训练数据集较大或最小支持阈值较低的情况下。在本文中，我们通过提出一种新的学习方法来克服挖掘大型训练数据集的问题，该学习方法可以在行和项目空间之间反复转换数据，以快速发现频繁的规则项，生成规则，随后进行排名和修剪规则。这种新的学习方法已在称为MRMCAR的并行Map-Reduce（MR）算法中实现，该算法可被视为文献中的第一个并行AC算法。新的学习方法可以在任何AC或关联规则挖掘算法的不同步骤中使用，如果与当前的水平或垂直方法进行对比，则可以很好地扩展。已经实现了两种版本的学习方法（Weka，Hadoop），并且已经针对不同的数据集进行了许多实验。比较的基础是分类准确性和算法所需的数据初始化，数据频繁发现，规则生成和规则修剪所需的时间。结果表明，MRMCAR在改进分类性能方面优于传统的AC挖掘算法和基于规则的分类算法。

著录项

来源
《Parallel Processing Letters 》 |2015年第2期| 共39页
作者
Fadi Thabtah; Suhel Hammoud; Hussein Abdel-Jaber;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件 ;
关键词
Associative classification; Data mining; Distributed Tasks; Hadoop; MapReduce; Parallel mining;

机译：关联分类数据挖掘分布式任务Hadoop MapReduce并行挖掘;

相似文献

外文文献
中文文献
专利

1. Parallel Associative Classification Data Mining Frameworks Based MapReduce [J] . Fadi Thabtah, Suhel Hammoud, Hussein Abdel-Jaber Parallel Processing Letters . 2015 ,第2期

机译：基于MapReduce的并行关联分类数据挖掘框架
2. Cost-sensitive incremental Classification under the MapReduce framework for Mining Imbalanced Massive Data Streams [J] . Huang Yuwen Journal of Discrete Mathematical Sciences and Cryptography . 2015 ,第1a2期

机译：MapReduce框架下成本敏感的增量分类，用于挖掘不平衡的海量数据流
3. Knowledge process of health big data using MapReduce-based associative mining [J] . So-Young Choi, Kyungyong Chung Personal and Ubiquitous Computing . 2020 ,第5期

机译：基于MapReduce的联想挖掘的健康大数据的知识过程
4. A cloud-based data mining framework for improved clinical diagnosis through parallel classification [C] . Y. V. Lokeswari, Shomona Gracia Jacob, Y. V. Lokeswari, International Conference on Applied and Theoretical Computing and Communication Technology . 2015

机译：基于云的数据挖掘框架，可通过并行分类改善临床诊断
5. An object model framework, runtime environment support, and database system software for a multiple instruction stream associative model of parallel computation [D] . Scherger, Michael C. 2005

机译：一个对象模型框架，运行时环境支持以及用于并行计算的多指令流关联模型的数据库系统软件
6. COMBImage2: a parallel computational framework for higher-order drug combination analysis that includes automated plate design matched filter based object counting and temporal data mining [O] . Efthymia Chantzi, Malin Jarvius, Mia Niklasson, 2019

机译：COMBImage2：用于高阶药物组合分析的并行计算框架包括自动化板设计基于匹配滤波器的对象计数和时间数据挖掘
7. A paralleled big data algorithm with mapreduce framework for mining twitter data [O] . Bing L, Chan KCC 2015

机译：带有mapreduce框架的并行大数据算法，用于挖掘Twitter数据

Parallel Associative Classification Data Mining Frameworks Based MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅