Frequent itemset mining is one of the most common of data mining tasks. In its simplest form, one is given a table of data in which the columns represent attributes and each row specifies a value for each attribute, each attribute-value pair being referred to as an item. The task is to find sets of these items that occur frequently in the data, where frequency is specified as a minimum occurrence threshold. Such frequent sets of items are referred to as "frequent itemsets". Many efficient techniques have been developed for finding all frequent itemsets. However, a practical problem is that the results sets can be exponentially large in the number of items. In this paper we propose representative frequent itemset mining in which the set of itemsets returned provide examples of the space of all possible frequent itemsets. Specifically, every item that appears in a frequent itemset at least once is shown in at least one representative itemset. If there are frequent itemsets without a particular item, one such example will be presented. One can generalise our framework to seek representative sets in which pairs, triples, etc. of frequent itemsets are presented. One can see the representative frequent itemset framework as a generalisation of traditional frequent itemset mining that provides an additional parameter for controlling the size of the result set. Specifically, one has access to the traditional frequency threshold, but also the maximum arity of the tuples of itemsets being exemplified. We propose a dedicated algorithm that significantly outperforms using a state-of-the-art itemset miner in generating representative itemsets.
展开▼