Mining top-k high-utility itemsets from a data stream under sliding window model

Dawar Siddharth; Sharma Veronica; Goyal Vikram

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Mining top-k high-utility itemsets from a data stream under sliding window model

【24h】

Mining top-k high-utility itemsets from a data stream under sliding window model

机译：从滑动窗口模型下的数据流中挖掘Top-K高实用项集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-utility itemset mining has gained significant attention in the past few years. It aims to find sets of items i.e. itemsets from a database with utility no less than a user defined threshold. The notion of utility provides more flexibility to an analyst to mine relevant itemsets. Nowadays, a continuous and unbounded stream of data is generated from web-clicks, transaction flow from retail stores, sensor networks, etc. Mining high-utility itemsets from a data stream is a challenging task as the incoming stream of data has to be processed on the fly with time and storage memory constraints. The number of high-utility itemsets depends on the user-defined threshold. A large number of itemsets can be generated at very low threshold values and vice versa. It can be a tedious task to set a threshold value to get a reasonable number of itemsets. Top-k high-utility itemset mining was coined to address this issue. k is the number of high-utility itemsets in the result set as defined by the user. In this paper, we propose a data structure and an efficient algorithm for mining top-k high-utility itemsets from a data stream. The algorithm has a single phase that does not generate any candidates, unlike many algorithms that work in two phases, i.e., candidate generation followed by candidates verification. We conduct extensive experiments on several real and synthetic datasets. Experimental results demonstrate that our proposed algorithm performs 20 to 80 times better on sparse datasets and 300 to 700 times on dense datasets than the state-of-the-art algorithm in terms of computation time. Furthermore, our proposed algorithm requires less memory compared to the state-of-the-art algorithm.

机译：高实用程序项目集矿业在过去几年中取得了重大关注。它旨在查找一组项目I.E.ETEMETER从具有实用程序的数据库中的项目集，不小于用户定义的阈值。实用程序的概念为分析师提供了更大的灵活性来挖掘相关项目集。如今，从Web键单击，来自零售商店，传感器网络等的事务流程生成连续和无界数据。从数据流中挖掘高实用程序项集是一个具有挑战性的任务，因为必须处理数据流流流随着时间和存储内存约束的动态。高实用程序项集的数量取决于用户定义的阈值。可以在非常低的阈值下生成大量项目集，反之亦然。设置阈值以获得合理数量的项目集可能是一个繁琐的任务。 Top-K高实用程序项目集挖掘已创建以解决此问题。 K是用户定义的结果集的高实用程序项数。在本文中，我们提出了一种数据结构和一种从数据流中挖掘Top-K高实用程序项集的有效算法。该算法具有不生成任何候选的单个阶段，与在两个阶段工作的许多算法不同，即候选生成，然后是候选验证。我们对几个真实和合成数据集进行了广泛的实验。实验结果表明，在计算时间方面，我们所提出的算法在稀疏数据集上执行20至80倍，以及致密数据集中的300至700次。此外，与最先进的算法相比，我们所提出的算法需要更少的记忆。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies 》 |2017年第4期| 共16页
作者
Dawar Siddharth; Sharma Veronica; Goyal Vikram;
展开▼
作者单位

Indraprastha Inst Informat Technol Dept Comp Sci Delhi India;

Indraprastha Inst Informat Technol Dept Comp Sci Delhi India;

Indraprastha Inst Informat Technol Dept Comp Sci Delhi India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词
Data mining; Pattern mining; Utility mining; Data streams; Top-k high utility mining;

机译：数据挖掘;模式挖掘;公用事业挖掘;数据流;Top-K高实用挖掘;

相似文献

外文文献
中文文献
专利

1. Mining top-k high-utility itemsets from a data stream under sliding window model [J] . Computing reviews . 2018 ,第7期

机译：在滑动窗口模型下从数据流中挖掘top-k高实用性项目集
2. Mining top-k frequent closed itemsets over data streams using the sliding window model [J] . Pauray S.M. Tsai Expert systems with applications . 2010 ,第10期

机译：使用滑动窗口模型在数据流上挖掘前k个频繁关闭的项目集
3. Mining High Utility Itemsets in Data Streams Based on the Weighted Sliding Window Model [J] . Pauray S.M. Tsai International Journal of Data Mining & Knowledge Management Process . 2014 ,第2期

机译：基于加权滑动窗口模型的数据流高效工具集挖掘
4. Mining Top-k Frequent-regular Itemsets from Data Streams Based on Sliding Window Technique [C] . Tashinee Mesama, Komate Amphawan International Conference on Advanced Informatics: Concept Theory and Applications . 2018

机译：基于滑动窗口技术的数据流Top-k频繁项目集挖掘
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Reducing False Negative Reads in RFID Data Streams Using an Adaptive Sliding-Window Approach [O] . Libe Valentine Massawe, Johnson D. M. Kinyua, Herman Vermaak 2012

机译：使用自适应滑动窗口方法减少RFID数据流中的假阴性读取
7. Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model [O] . Farzanyar Zahra, Kangavari Mohammadreza, Cercone Nick 2012

机译：Max-FISM：使用滑动窗口模型在数据流上挖掘（最近）最大频繁项集
8. Data Stream Mining Based Dynamic Link Anomaly Analysis Using Paired Sliding Time Window Data. [R] . Han, K., Zhang, T., Liao, Q. 2014

机译：基于数据流挖掘的成对滑动时间窗数据动态链接异常分析。

Mining top-k high-utility itemsets from a data stream under sliding window model

摘要

著录项

相似文献

相关主题

期刊订阅