Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

Karim Md. Rezaul; Cochez Michael; Beyan Oya Deniz; Ahmed Chowdhury Farhan; Decker Stefan

首页> 外文期刊>Information Sciences: An International Journal >Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

【24h】

Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

机译：在交易数据库和动态数据流中采集最大频繁模式：基于火花的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers' purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes. (C) 2017 Elsevier Inc. All rights reserved.

机译：在交易数据库（TDB）和动态数据流（DDS）中挖掘最大频繁模式（MFP）对商业智能基本很重要。 MFPS，作为最小的模式，有助于揭示客户的购买规则和市场篮子分析（MBA）。虽然，在该领域进行了许多研究，但其中大多数都延长了基于主存储器的APRiori或FP-生长算法。因此，这些方法不仅是不可规划的，而且缺乏平行。因此，无法满足越来越大的大数据源要求。此外，由于存在NULL交易，某些现有方法中的采矿性能急剧下降。因此，我们提出了利用Apache Spark挖掘MFP的有效方法来克服这些问题。对于更快的计算和高效利用存储器，我们利用了基于素数的数据变换技术，其中保留了单个事务的值。在删除NULL事务和不频繁项目之后，与原始分布相比，生成的转换数据集变为密度。我们在真正的静态TDB和DDS中测试了所提出的算法。实验结果和性能分析表明，我们的方法是对大型数据集大小有效和可扩展。（c）2017年Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2018年第2018期|共23页
作者
Karim Md. Rezaul; Cochez Michael; Beyan Oya Deniz; Ahmed Chowdhury Farhan; Decker Stefan;
展开▼
作者单位

Fraunhofer Inst Appl Informat Technol FIT DE-53754 St Augustin Germany;

Fraunhofer Inst Appl Informat Technol FIT DE-53754 St Augustin Germany;

Fraunhofer Inst Appl Informat Technol FIT DE-53754 St Augustin Germany;

Univ Dhaka Dept Comp Sci &

Engn Dhaka 1000 Bangladesh;

Fraunhofer Inst Appl Informat Technol FIT DE-53754 St Augustin Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Big data; Transactional databases; Dynamic data streams; Null transactions; Prime number theory; Data mining; Apache Spark; Maximal frequent patterns;

机译：大数据;事务数据库;动态数据流;null事务;素数理论;数据挖掘;apache spark;最大频繁模式;

相似文献

外文文献
中文文献
专利

1. Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach [J] . Karim Md. Rezaul, Cochez Michael, Beyan Oya Deniz, Information Sciences: An International Journal . 2018,第期

机译：在交易数据库和动态数据流中采集最大频繁模式：基于火花的方法
2. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases [J] . Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, Genomics & Informatics . 2012,第1期

机译：从大型DNA序列数据库中挖掘最大连续频率模式的有效方法
3. Transaction-item Association Matrix-Based Frequent Pattern Network Mining Algorithm in Large-scale Transaction Database [J] . WEI-QING SUN, CHENG-MIN WANG, TIE-YAN ZHANG, WSEAS Transactions on Computers . 2009,第7a9期

机译：大型交易数据库中基于交易项目关联矩阵的频繁模式网络挖掘算法
4. Privacy Preserving Mining Maximal Frequent Patterns in Transactional Databases [C] . Rezaul Karim, Mamunur Rashid, Byeong-Soo Jeong, Database systems for advanced applications.;Part 1. . 2012

机译：隐私保护在事务数据库中挖掘最大频繁模式
5. Mining Frequent Itemsets of a Central Fill Pharmacy Transaction Database to Enhance the Planogram of Robotic Dispensing System [D] . Sundaramurthy, Sumanth S. 2018

机译：挖掘中央填充药房交易数据库的常用项目集，以增强机器人配药系统的货架图
6. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases [O] . Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, 2012

机译：从大型DNA序列数据库中挖掘最大连续频率模式的有效方法
7. Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach [O] . Md. Rezaul Karim, Michael Cochez, Oya Deniz Beyan, 2018

机译：在交易数据库和动态数据流中采集最大频繁模式：基于火花的方法
8. Crime Pattern Analysis: A Spatial Frequent Pattern Mining Approach. [R] . D. Oliver P. Mohan S. Shekhar X. Zhou 2012

机译：犯罪模式分析：一种空间频繁模式挖掘方法。

Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

摘要

著录项

相似文献

相关主题

期刊订阅