首页> 外文会议>IEEE International Conference on Data Mining Workshops >TKC: Mining Top-K Cross-Level High Utility Itemsets
【24h】

TKC: Mining Top-K Cross-Level High Utility Itemsets

机译:TKC:挖掘Top-K交叉级高实用项目

获取原文
获取外文期刊封面目录资料

摘要

High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.
机译:高实用程序项目集挖掘是一项良好的数据挖掘任务,用于分析客户交易。目标是找到所有高实用程序项集,即购买的项目,它会一起生成等于或大于用户定义的最小实用程序阈值的利润。然而,传统高实用程序项目集算法的限制是忽略项目类别(例如饮料,乳制品)。最近,两种算法被设计为找到多级和交叉级高实用程序集,以揭示项目和/或项目类别之间的关系。这是通过考虑产品分类,其中项目被组织成层次结构来实现。虽然这些算法可以揭示有趣的模式,但问题是设置最小的实用阈值不是直观的,极大地影响找到的模式数量和算法的性能。如果用户将阈值设置得太低,则找到大量的模式,并且运行时可能很长,而如果阈值设置得太高,则找到很少的模式。因此,用户通常必须多次运行算法以找到适当的阈值以获得足够的模式。本文通过呈现一种名为TKC(Top-K交叉级高实用程序项集矿器)的新型算法来解决此问题,用户可以直接设置模式的数量 $ k $ 被发现。 TKC执行深度优先搜索,并包括搜索空间修剪技术和优化,以提高其性能。在具有分类信息的零售数据上进行了实验。结果表明,该算法有效,优化提高了其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号