【24h】

Representative Itemset Mining

机译:代表性项目集挖掘

获取原文

摘要

Frequent itemset mining is one of the most common of data mining tasks. In its simplest form, one is given a table of data in which the columns represent attributes and each row specifies a value for each attribute, each attribute-value pair being referred to as an item. The task is to find sets of these items that occur frequently in the data, where frequency is specified as a minimum occurrence threshold. Such frequent sets of items are referred to as "frequent itemsets". Many efficient techniques have been developed for finding all frequent itemsets. However, a practical problem is that the results sets can be exponentially large in the number of items. In this paper we propose representative frequent itemset mining in which the set of itemsets returned provide examples of the space of all possible frequent itemsets. Specifically, every item that appears in a frequent itemset at least once is shown in at least one representative itemset. If there are frequent itemsets without a particular item, one such example will be presented. One can generalise our framework to seek representative sets in which pairs, triples, etc. of frequent itemsets are presented. One can see the representative frequent itemset framework as a generalisation of traditional frequent itemset mining that provides an additional parameter for controlling the size of the result set. Specifically, one has access to the traditional frequency threshold, but also the maximum arity of the tuples of itemsets being exemplified. We propose a dedicated algorithm that significantly outperforms using a state-of-the-art itemset miner in generating representative itemsets.
机译:频繁项集挖掘是最常见的数据挖掘任务之一。以其最简单的形式,提供了一个数据表,其中的列代表属性,每一行为每个属性指定一个值,每个属性值对被称为一项。任务是找到在数据中频繁出现的这些项目的集合,其中将频率指定为最小出现阈值。这种频繁的项目集被称为“频繁项目集”。已经开发了许多有效的技术来查找所有频繁项集。但是,一个实际问题是结果集的项数可能成倍增长。在本文中,我们提出了代表性的频繁项目集挖掘,其中返回的项目集集合提供了所有可能的频繁项目集空间的示例。具体而言,在频繁项目集中至少出现一次的每个项目都在至少一个代表性项目集中显示。如果有频繁的项目集而没有特定的项目,将给出一个这样的示例。可以概括一下我们的框架,以寻找具有代表性的集合,其中以频繁项集的对,三元组等形式呈现。可以看到,代表性的频繁项集框架是对传统频繁项集挖掘的概括,它提供了一个额外的参数来控制结果集的大小。具体来说,人们可以使用传统的频率阈值,但也可以列举出一组最大的元组元组。我们提出了一种专用算法,在生成代表性项目集时,该算法的性能明显优于使用最新的项目集挖掘器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号