Representative Itemset Mining

机译：代表性项目集挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent itemset mining is one of the most common of data mining tasks. In its simplest form, one is given a table of data in which the columns represent attributes and each row specifies a value for each attribute, each attribute-value pair being referred to as an item. The task is to find sets of these items that occur frequently in the data, where frequency is specified as a minimum occurrence threshold. Such frequent sets of items are referred to as "frequent itemsets". Many efficient techniques have been developed for finding all frequent itemsets. However, a practical problem is that the results sets can be exponentially large in the number of items. In this paper we propose representative frequent itemset mining in which the set of itemsets returned provide examples of the space of all possible frequent itemsets. Specifically, every item that appears in a frequent itemset at least once is shown in at least one representative itemset. If there are frequent itemsets without a particular item, one such example will be presented. One can generalise our framework to seek representative sets in which pairs, triples, etc. of frequent itemsets are presented. One can see the representative frequent itemset framework as a generalisation of traditional frequent itemset mining that provides an additional parameter for controlling the size of the result set. Specifically, one has access to the traditional frequency threshold, but also the maximum arity of the tuples of itemsets being exemplified. We propose a dedicated algorithm that significantly outperforms using a state-of-the-art itemset miner in generating representative itemsets.

机译：频繁项集挖掘是最常见的数据挖掘任务之一。以其最简单的形式，提供了一个数据表，其中的列代表属性，每一行为每个属性指定一个值，每个属性值对被称为一项。任务是找到在数据中频繁出现的这些项目的集合，其中将频率指定为最小出现阈值。这种频繁的项目集被称为“频繁项目集”。已经开发了许多有效的技术来查找所有频繁项集。但是，一个实际问题是结果集的项数可能成倍增长。在本文中，我们提出了代表性的频繁项目集挖掘，其中返回的项目集集合提供了所有可能的频繁项目集空间的示例。具体而言，在频繁项目集中至少出现一次的每个项目都在至少一个代表性项目集中显示。如果有频繁的项目集而没有特定的项目，将给出一个这样的示例。可以概括一下我们的框架，以寻找具有代表性的集合，其中以频繁项集的对，三元组等形式呈现。可以看到，代表性的频繁项集框架是对传统频繁项集挖掘的概括，它提供了一个额外的参数来控制结果集的大小。具体来说，人们可以使用传统的频率阈值，但也可以列举出一组最大的元组元组。我们提出了一种专用算法，在生成代表性项目集时，该算法的性能明显优于使用最新的项目集挖掘器。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2016年|142-148|共7页
会议地点
作者
Hong Huang; Barry OSullivan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Itemsets; Data mining; Focusing; Conferences;

机译：项目集;数据挖掘;关注;会议;

相似文献

外文文献
中文文献
专利

1. Mining erasable itemsets with subset and superset itemset constraints [J] . Bay Vo, Tuong Le, Pedrycz Witold, Expert Systems with Application . 2017,第mara期

机译：挖掘具有子集和超集项目集约束的可擦除项目集
2. Modified Frequent Itemset Mining using Itemset Tidset pair [J] . Dr. Jitendra Agrawal, Dr. Shikha Agrawal International Journal of Computer Science and Technology . 2017,第1期

机译：使用项目集交易集对修改频繁项目集挖掘
3. Binary partition for itemsets expansion in mining high utility itemsets [J] . Song Wei, Wang Chunhua, Li Jinhong Intelligent data analysis . 2016,第4期

机译：用于挖掘高功能项目集的项目集扩展的二进制分区
4. Representative Itemset Mining [C] . Hong Huang, Barry OSullivan IEEE International Conference on Tools with Artificial Intelligence . 2016

机译：代表项目集矿业
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. HUIL-TN HUI-TN: Mining high utility itemsets based on pattern-growth [O] . Le Wang, Shui Wang 2021

机译：Huil-Tn＆Hui-TN：基于模式增长的矿业高实用项目集
7. Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles [O] . Seppänen Jouni K. 2006

机译：在数据挖掘中使用和扩展项目集：查询近似，密集项目集和切片

Representative Itemset Mining

摘要

著录项

相似文献

相关主题

期刊订阅