A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets

机译：三个基于磁盘的索引和查询频繁项目集的性能研究

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Frequent itemset mining is an important problem in the data mining area. Extensive efforts have been devoted to developing efficient algorithms for mining frequent itemsets. However, not much attention is paid on managing the large collection of frequent itemsets produced by these algorithms for subsequent analysis and for user exploration. In this paper, we study three structures for indexing and querying frequent itemsets: inverted files, signature files and CFP-tree. The first two structures have been widely used for indexing general set-valued data. We make some modifications to make them more suitable for indexing frequent itemsets. The CFP-tree structure is specially designed for storing frequent itemsets. We add a pruning technique based on length-2 frequent itemsets to make it more efficient for processing superset queries. We study the performance of the three structures in supporting five types of containment queries: exact match, subset/superset search and immediate subset/superset search. Our results show that no structure can outperform other structures for all the five types of queries on all the datasets. CFP-tree shows better overall performance than the other two structures.

机译：频繁的项目集挖掘是数据挖掘区域的一个重要问题。广泛的努力致力于开发用于开采频繁项目集的高效算法。但是，在管理这些算法生产的大量频繁项目集中，并没有大量关注，以便进行后续分析和用户探索。在本文中，我们研究了三种索引和查询频繁项集的结构：反向文件，签名文件和CFP树。前两个结构已广泛用于索引一般设定值数据。我们进行了一些修改，使其更适合索引频繁的项目集。 CFP树结构专门用于存储频繁的项目集。我们添加了基于长度-2频繁项集的修剪技术，使其更有效地处理超集查询。我们研究三种结构的性能支持五种类型的容纳查询：完全匹配，子集/超集搜索和立即子集/超级搜索。我们的结果表明，没有结构可以为所有数据集上所有五种类型的查询差异。 CFP-Tree显示比其他两个结构更好的整体性能。

著录项

来源
《International conference on very large data bases》|2013年||共12页
会议地点
作者
Guimei Liu; Andre Suchitra; Limsoon Wong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
Indexing frequent itemsets; inverted files; signature files; CFP-tree;

机译：索引频繁的项目集;反转文件;签名文件;CFP树;

相似文献

外文文献
中文文献
专利

1. CFP-tree: A compact disk-based structure for storing and querying frequent itemsets [J] . Guimei Liu, Hongjun Lu, Jeffrey Xu Yu Information Systems . 2007,第2期

机译：CFP-tree：一种基于磁盘的紧凑结构，用于存储和查询频繁项集
2. Performance study of distributed Apriori-like frequent itemsets mining [J] . Lamine M. Aouadl, Nhien-An Le-Khac, Tahar M. Kechadi Knowledge and information systems . 2010,第1期

机译：分布式Apriori样频繁项集挖掘性能研究
3. Performance study of distributed Apriori-like frequent itemsets mining [J] . Lamine M. Aouad, Nhien-An Le-Khac, Tahar M. Kechadi Knowledge and Information Systems . 2010,第1期

机译：分布式Apriori样频繁项集挖掘性能研究
4. A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets [C] . Guimei Liu, Andre Suchitra, Limsoon Wong International conference on very large data bases . 2013

机译：三种基于磁盘的索引和查询频繁项集结构的性能研究
5. Performance comparison of spatial indexing structures for different query types. [D] . Pant, Neelabh. 2015

机译：不同查询类型的空间索引结构的性能比较。
6. A Study of the Adequacy of User and Indexing Vocabularies in Natural Language Queries to a MeSH-indexed Health Gateway [O] . N. Grabar, P. Zweigenbaum, L. Soualmia, 2002

机译：MeSH索引的健康网关的自然语言查询中的用户和索引词汇是否足够的研究
7. A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets [O] . Liu Guimei, Suchitra Andre, Wong Limsoon 2013

机译：三种基于磁盘的索引和查询频繁项集结构的性能研究
8. Frequent Itemset Mining for Query Expansion in Microblog Ad-hoc Search. [R] . Aboulnaga, Y., Clarke, C. L. 2012

机译：微博ad-hoc搜索中用于查询扩展的频繁项集挖掘。

A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅