首页> 外文期刊>Information Sciences: An International Journal >Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach
【24h】

Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

机译:在交易数据库和动态数据流中采集最大频繁模式:基于火花的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers' purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes. (C) 2017 Elsevier Inc. All rights reserved.
机译:在交易数据库(TDB)和动态数据流(DDS)中挖掘最大频繁模式(MFP)对商业智能基本很重要。 MFPS,作为最小的模式,有助于揭示客户的购买规则和市场篮子分析(MBA)。虽然,在该领域进行了许多研究,但其中大多数都延长了基于主存储器的APRiori或FP-生长算法。因此,这些方法不仅是不可规划的,而且缺乏平行。因此,无法满足越来越大的大数据源要求。此外,由于存在NULL交易,某些现有方法中的采矿性能急剧下降。因此,我们提出了利用Apache Spark挖掘MFP的有效方法来克服这些问题。对于更快的计算和高效利用存储器,我们利用了基于素数的数据变换技术,其中保留了单个事务的值。在删除NULL事务和不频繁项目之后,与原始分布相比,生成的转换数据集变为密度。我们在真正的静态TDB和DDS中测试了所提出的算法。实验结果和性能分析表明,我们的方法是对大型数据集大小有效和可扩展。 (c)2017年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号