首页> 外文期刊>Engineering Applications of Artificial Intelligence >Mining of frequent patterns with multiple minimum supports
【24h】

Mining of frequent patterns with multiple minimum supports

机译:具有多个最小支持的频繁模式的挖掘

获取原文
获取原文并翻译 | 示例

摘要

Frequent pattern mining (FPM) is an important topic in data mining for discovering the implicit but useful information. Many algorithms have been proposed for this task but most of them suffer from an important limitation, which relies on a single uniform minimum support threshold as the sole criterion to identify frequent patterns (FPs). Using a single threshold value to assess the usefulness of all items in a database is inadequate and unfair in real-life applications since each item is different and not all items should be treated as the same. Several algorithms have been developed for mining FPs with multiple minimum supports but most of them suffer from the time-consuming problem and require a large amount of memory. In this paper, we address this issue by introducing the novel approach named Frequent Pattern mining with Multiple minimum supports from the Enumeration-tree (FP-ME). In the developed Set-Enumeration-tree with Multiple minimum supports (ME-tree) structure, a new sorted downward closure (SDC) property of FPs and the least minimum support (LMS) concept with multiple minimum supports are used to effectively prune the search space. The proposed FP-ME algorithm can directly discover FPs from the ME-tree without candidate generation. Moreover, an improved algorithm, named FP-ME_(DiffSet), is also developed based on the DiffSet concept, to further increase mining performance. Substantial experiments on both real-life and synthetic datasets show that the proposed algorithms can not only avoid the "rare item problem", but also efficiently and effectively discover the complete set of FPs in transactional databases while considering multiple minimum supports and outperform the state-of-the-art CFP-growth++ algorithm in terms of execution time, memory usage and scalability.
机译:频繁模式挖掘(FPM)是数据挖掘中发现隐式但有用信息的重要主题。已经针对该任务提出了许多算法,但是大多数算法都有一个重要的局限性,它依赖于一个统一的最小支持阈值作为识别频繁模式(FP)的唯一标准。在现实生活中,使用单个阈值评估数据库中所有项目的有用性是不充分且不公平的,因为每个项目都是不同的,并且并非所有项目都应被视为相同。已经开发了几种用于挖掘具有多个最小支持的FP的算法,但是大多数算法都存在耗时的问题,并且需要大量的内存。在本文中,我们通过介绍一种新颖的方法来解决此问题,该方法名为“频繁模式挖掘”,具有枚举树(FP-ME)的多个最小支持。在已开发的具有多个最小支持量的Set-Enumeration-树(ME-tree)结构中,使用了FP的新的排序向下闭合(SDC)属性和具有多个最小支持量的最小最小支持量(LMS)概念来有效地修剪搜索空间。提出的FP-ME算法可以直接从ME树中发现FP,而无需生成候选对象。此外,还基于DiffSet概念开发了一种名为FP-ME_(DiffSet)的改进算法,以进一步提高挖掘性能。在真实数据集和合成数据集上的大量实验表明,所提出的算法不仅可以避免“稀有商品问题”,而且可以在考虑多个最小支持且优于状态的情况下,高效而有效地发现交易数据库中完整的FP集。在执行时间,内存使用和可伸缩性方面最先进的CFP-growth ++算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号