首页> 外文OA文献 >Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach
【2h】

Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach

机译:在交易数据库和动态数据流中采集最大频繁模式:基于火花的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefore, proposed an efficient way to mining MFPs with Apache Spark to overcome these issues. For the faster computation and efficient utilization of memory, we utilized a prime number based data transformation technique, in which values of individual transaction have been preserved. After removing null transactions and infrequent items, the resulting transformed dataset becomes denser compared to the original distributions. We tested our proposed algorithms in both real static TDBs and DDSs. Experimental results and performance analysis show that our approach is efficient and scalable to large dataset sizes
机译:在事务数据库(TDB)和动态数据流(DDSS)中挖掘最大频繁模式(MFP)对商业智能基本很重要。 MFPS,作为最小的模式集,有助于揭示客户的购买规则和市场篮子分析(MBA)。虽然,在这方面进行了许多研究,但大多数研究都延长了基于主存储器的APRiori或FP-生长算法。因此,这些方法不仅是不可公积的,而且缺乏平行。因此,无法满足越来越大的大数据来源要求。此外,由于存在空交易,某些现有方法中的采矿性能急剧降低。因此,我们提出了利用Apache Spark挖掘MFP的有效方法来克服这些问题。对于更快的计算和高效利用存储器,我们利用了基于素数的数据转换技术,其中已保留了单个事务的值。在删除NULL事务和不频繁项目之后,与原始分布相比,生成的转换数据集变为密度。我们在真正的静态TDB和DDS中测试了我们所提出的算法。实验结果和性能分析表明,我们的方法是高效且可扩展到大型数据集大小

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号