...
【24h】

Mining top-k high-utility itemsets from a data stream under sliding window model

机译:从滑动窗口模型下的数据流中挖掘Top-K高实用项集

获取原文
获取原文并翻译 | 示例

摘要

High-utility itemset mining has gained significant attention in the past few years. It aims to find sets of items i.e. itemsets from a database with utility no less than a user defined threshold. The notion of utility provides more flexibility to an analyst to mine relevant itemsets. Nowadays, a continuous and unbounded stream of data is generated from web-clicks, transaction flow from retail stores, sensor networks, etc. Mining high-utility itemsets from a data stream is a challenging task as the incoming stream of data has to be processed on the fly with time and storage memory constraints. The number of high-utility itemsets depends on the user-defined threshold. A large number of itemsets can be generated at very low threshold values and vice versa. It can be a tedious task to set a threshold value to get a reasonable number of itemsets. Top-k high-utility itemset mining was coined to address this issue. k is the number of high-utility itemsets in the result set as defined by the user. In this paper, we propose a data structure and an efficient algorithm for mining top-k high-utility itemsets from a data stream. The algorithm has a single phase that does not generate any candidates, unlike many algorithms that work in two phases, i.e., candidate generation followed by candidates verification. We conduct extensive experiments on several real and synthetic datasets. Experimental results demonstrate that our proposed algorithm performs 20 to 80 times better on sparse datasets and 300 to 700 times on dense datasets than the state-of-the-art algorithm in terms of computation time. Furthermore, our proposed algorithm requires less memory compared to the state-of-the-art algorithm.
机译:高实用程序项目集矿业在过去几年中取得了重大关注。它旨在查找一组项目I.E.ETEMETER从具有实用程序的数据库中的项目集,不小于用户定义的阈值。实用程序的概念为分析师提供了更大的灵活性来挖掘相关项目集。如今,从Web键单击,来自零售商店,传感器网络等的事务流程生成连续和无界数据。从数据流中挖掘高实用程序项集是一个具有挑战性的任务,因为必须处理数据流流流随着时间和存储内存约束的动态。高实用程序项集的数量取决于用户定义的阈值。可以在非常低的阈值下生成大量项目集,反之亦然。设置阈值以获得合理数量的项目集可能是一个繁琐的任务。 Top-K高实用程序项目集挖掘已创建以解决此问题。 K是用户定义的结果集的高实用程序项数。在本文中,我们提出了一种数据结构和一种从数据流中挖掘Top-K高实用程序项集的有效算法。该算法具有不生成任何候选的单个阶段,与在两个阶段工作的许多算法不同,即候选生成,然后是候选验证。我们对几个真实和合成数据集进行了广泛的实验。实验结果表明,在计算时间方面,我们所提出的算法在稀疏数据集上执行20至80倍,以及致密数据集中的300至700次。此外,与最先进的算法相比,我们所提出的算法需要更少的记忆。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号