Exploiting Apache Flink's iteration capabilities for distributed Apriori: Community detection problem as an example

机译：利用Apache Flink的分布式Apriori迭代功能：以社区检测问题为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extraction of useful information from large datasets is one of the most important research problem. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exists many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these type of distributed applications, mapreduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software framework with mapreduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes hadoop inefficient. A number of map reduce based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted lot of attention because of their inbuilt support to distributed computations. Earlier we had proposed a reduced-Apriori algorithm on Spark platform which outperforms parallel Apriori, firstly because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, present work is a natural sequel of our earlier work and targets on implementing, testing and benchmarking Apriori on Apache Flink and compares it with Spark implementation. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori algorithm on Flink. We also use community detection graph mining problem as a test case to demonstrate our implementations.

机译：从大型数据集中提取有用信息是最重要的研究问题之一。关联规则挖掘是达到此目的的最佳方法之一。在基于大型交易的数据集中查找项目之间的可能关联（查找频繁模式）是关联规则挖掘的最重要部分。有很多算法可以找到频繁的模式，但是Apriori算法始终易于实现，并且易于并行化，因此始终是首选算法。存在许多基于单机的Apriori变体，但如今可用的大量数据超过了单机的容量。因此，为了满足这种不断增长的海量数据的需求，需要基于多机器的Apriori算法。对于这些类型的分布式应用程序，mapreduce是一种流行的容错框架。 Hadoop是使用mapreduce方法的最佳开源软件框架之一，可使用从商品硬件构建的集群对大型数据集进行分布式存储和分布式处理。但是在像Apriori这样的高度迭代算法的每次迭代中，繁重的磁盘I / O操作都会使hadoop效率低下。近年来，许多基于Map Reduce的平台正在开发用于并行计算。其中，Spark和Flink这两个平台由于对分布式计算的内置支持而备受关注。较早之前，我们在Spark平台上提出了一个简化的Apriori算法，其性能优于并行Apriori，首先是因为使用了Spark，其次是因为我们在标准Apriori中提出了改进。因此，当前的工作是我们早期工作的自然结果，其目标是在Apache Flink上实现，测试和基准化Apriori，并将其与Spark实现进行比较。我们进行了深入的实验，以了解Flink上Apriori算法的有效性，效率和可扩展性。我们还使用社区检测图挖掘问题作为测试案例来演示我们的实现。

著录项

来源
《International conference on advances in computing, communications and informatics》|2016年|739-745|共7页
会议地点 Jaipur(IN)
作者
Sanjay Rathee; Arti Kashyap;
展开▼
作者单位

School of Computing and Electrical Engineering I.I.T. Mandi Himachal Pardesh India;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Itemsets; Algorithm design and analysis; Sparks; Data mining; Clustering algorithms; Iterative methods; Informatics;

机译：项目集；算法设计与分析；火花;数据挖掘;聚类算法；迭代方法；信息学;

相似文献

外文文献
中文文献
专利

1. A comparison on scalability for batch big data processing on Apache Spark and Apache Flink [J] . Diego García-Gil, Sergio Ramírez-Gallego, Salvador García, Big Data Analytics . 2017,第1期

机译：Apache Spark和Apache Flink上批处理大数据处理的可伸缩性比较
2. Real-time incremental recommendation for streaming data based on apache flink [J] . Tang Zhuo, Liu Zeyu, Li Kenli, Intelligent data analysis . 2019,第6期

机译：基于Apache Flink的流媒体数据的实时增量推荐
3. Big data multi-query optimisation with Apache Flink [J] . Radhya Sahal, Mohamed H. Khafagy, Fatma A. Omara International Journal of Web Engineering and Technology . 2018,第1期

机译：具有Apache Flink的大数据多查询优化
4. Exploiting Apache Flink's iteration capabilities for distributed Apriori: Community detection problem as an example [C] . Sanjay Rathee, Arti Kashyap International Conference on Advances in Computing, Communications and Informatics . 2016

机译：利用Apache Flink的分布式APRIORI的迭代功能：社区检测问题作为示例
5. WiFi Miner: An online apriori and sensor based wireless network Intrusion Detection System. [D] . Rahman, S S Ahmedur. 2008

机译：WiFi Miner：基于在线先验和传感器的无线网络入侵检测系统。
6. A Distributed Multi-Tier Emergency Alerting System Exploiting Sensors-Based Event Detection to Support Smart City Applications [O] . Daniel G. Costa, Francisco Vasques, Paulo Portugal, 2020

机译：分布式多层紧急警报系统利用基于传感器的事件检测支持智能城市应用
7. Augmenting Surveillance System Capabilities by Exploiting Event Correlation and Distributed Attack Detection [O] . Flammini, Francesco, Mazzocca, Nicola, Pappalardo, Alfio, 2011

机译：通过利用事件相关性和分布式攻击检测来增强监视系统的功能

Exploiting Apache Flink's iteration capabilities for distributed Apriori: Community detection problem as an example

摘要

著录项

相似文献

相关主题

期刊订阅