Parallel frequent itemset mining with spark RDD framework for disease prediction

机译：使用Spark RDD框架并行频繁项集挖掘以进行疾病预测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim behind frequent itemset mining is to find all common sets of items defined as those itemsets that have at least a minimum support. There are many well known algorithms for frequent itemset mining. Some of which are Apriori, Eclat, RElim, SaM, and FP-Growth. Although each of these algorithms is well formed and works in different scenarios, the main drawback of these algorithms is that they were designed to perform on small chunks of data. These limitations were imposed based on time that they were developed. The notion of big data was not up and running at these times. So in the present scenario these algorithms won't perform well on the current statistics of data present. So we propose a new approach of implementing these well known algorithms on a parallelized manner so that it can handle the data perfectly. The proposed work parallelizes, dynamic frequent itemset mining algorithm, Faster-IAPI with spark RDD framework. The main goal of selecting Apache Spark is that it overcomes the limitations of the Hadoop architecture which was basically designed to handle big data processing in a parallelized manner. The main drawback of the architecture was that it doesn't handle the Iterative algorithms very well. This drawback is rectified in spark which handles it well. In this approach this algorithm is applied to find correlation between different symptoms of patients in faster and efficient manner and provides the support for the prediction of occurrence of disease based on the symptoms.

机译：频繁进行项目集挖掘的目的是找到所有常见项目集，这些项目集定义为至少具有最小支持的那些项目集。有很多众所周知的用于频繁项集挖掘的算法。其中一些是Apriori，Eclat，RElim，SaM和FP-Growth。尽管这些算法中的每一种都结构良好且可以在不同的场景下工作，但是这些算法的主要缺点是它们被设计为对小块数据执行。这些限制是根据开发时间而强加的。在这些时候，大数据的概念还没有建立起来。因此，在当前情况下，这些算法在当前存在的数据统计信息上效果不佳。因此，我们提出了一种以并行方式实现这些众所周知的算法的新方法，以便它可以完美地处理数据。拟议的工作与火花RDD框架并行化，动态频繁项集挖掘算法Faster-IAPI。选择Apache Spark的主要目的是克服了Hadoop体系结构的局限性，该体系结构基本上旨在以并行方式处理大数据处理。该体系结构的主要缺点是，它不能很好地处理迭代算法。这个缺点在火花中得到了纠正，可以很好地处理它。在这种方法中，该算法可用于以更快，更有效的方式查找患者不同症状之间的相关性，并为基于症状的疾病发生预测提供支持。

著录项

来源
《International Conference on Circuit, Power and Computing Technologies》|2016年|1-5|共5页
会议地点
作者
Rini Joy; K. K. Sherly;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Itemsets; Algorithm design and analysis; Sparks; Diseases; Heuristic algorithms; Data mining;

机译：项目集;算法设计与分析;火花;疾病;启发式算法;数据挖掘;

相似文献

外文文献
中文文献
专利

1. HBPFP-DC: A parallel frequent itemset mining using Spark [J] . Xun Yaling, Zhang Jifu, Yang Haifeng, Parallel Computing . 2021,第Apra期

机译：HBPFP-DC：使用Spark的并行频繁项目集挖掘
2. Parallelization of Frequent Itemset Mining Methods with FP-tree: An Experiment with PrePost~+ Algorithm [J] . Jamsheela Olakara, Gopalakrishna Raju The international arab journal of information technology . 2021,第2期

机译：使用FP-Tree的频繁项目集挖掘方法的并行化：Prepost〜+算法的实验
3. SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming [J] . Xiao Wen, Hu Juan Journal of supercomputing . 2020,第10期

机译：SWECLAT：使用Spark流式传输数据的频繁项目集挖掘算法
4. Parallel Frequent Itemset Mining with Spark RDD Framework for Disease Prediction [C] . Rini Joy, Sherly K. K. International Conference on Circuits, Power and Computing Technologies . 2016

机译：与Spark RDD框架进行平行频繁的术语挖掘疾病预测框架
5. Mining Frequent Itemsets Using Improved Apriori on Spark [D] . Khandelwal, Ashutosh. 2017

机译：在Spark上使用改进的Apriori挖掘频繁项集
6. Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques [O] . Trung Nghia Vu, Aida Mrzic, Dirk Valkenborg, 2014

机译：利用频繁项集挖掘技术揭示未分配质谱峰之间的关联
7. RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework [O] . Pankaj Singh, Sudhakar Singh, P. K. Mishra, 2020

机译：RDD-Eclat：在Spark RDD框架上并行化Eclat算法的方法

Parallel frequent itemset mining with spark RDD framework for disease prediction

摘要

著录项

相似文献

相关主题

期刊订阅