首页> 外文OA文献 >Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles

【2h】

Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles

机译：在数据挖掘中使用和扩展项目集：查询近似，密集项目集和切片

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated to the terms corresponding to the frequent itemsets: how good are the approximations thereby obtained? The answer is twofold: in theory, the worst-case bound for the algorithm is very large, and a construction is given that shows the bound to be tight; but in practice, the approximations tend to be much closer to the correct answer than in the worst case. While some other algorithms based on frequent itemsets yield even better approximations, they are not as widely applicable.The second question concerns extending the definition of frequent itemsets to relax the requirement of perfect co-occurrence: highly correlated items may form an interesting set, even if they never co-occur in a single record. The problem is to formalize this idea in a way that still admits efficient mining algorithms. Two different approaches are used. First, dense itemsets are defined in a manner similar to the usual frequent itemsets and can be found using a modification of the original itemset mining algorithm. Second, tiles are defined in a different way so as to form a model for the whole data, unlike frequent and dense itemsets. A heuristic algorithm based on spectral properties of the data is given and some of its properties are explored.

机译：频繁项集是数据挖掘中最著名的概念之一，并且对项集挖掘算法也进行了积极的研究。如果一个项目集同时出现在足够多的记录中，则该项目集在数据库中很常见。本文解决了与频繁项目集有关的两个问题。第一个问题是通过一种方法来提出的，该方法通过将包含与排除之和截断为与频繁项集相对应的项来近似逻辑查询：由此获得的近似值有多好？答案是双重的：从理论上讲，算法的最坏情况边界非常大，并且给出的结构表明边界是紧密的。但实际上，与最坏的情况相比，近似值往往更接近正确的答案。虽然其他一些基于频繁项目集的算法可以提供更好的近似值，但它们的应用范围却不那么广泛。第二个问题是扩展频繁项目集的定义以放宽完美共现的要求：高度相关的项目可能会形成一个有趣的集合，甚至如果它们从未同时出现在单个记录中。问题在于以仍然可以接受有效挖掘算法的方式来形式化这一想法。使用了两种不同的方法。首先，以类似于通常的频繁项目集的方式定义密集项目集，并且可以使用原始项目集挖掘算法的修改来找到密集项目集。其次，与频繁且密集的项目集不同，以不同的方式定义切片以形成整个数据的模型。给出了一种基于数据频谱特性的启发式算法，并探讨了其某些特性。

著录项

作者
Seppänen Jouni K.;
展开▼
作者单位

展开▼
年度 2006
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. EFFICIENT SUBSET-LATTICE ALGORITHMS FOR MINING CLOSED FREQUENT ITEMSETS AND MAXIMAL FREQUENT ITEMSETS IN DATA STREAMS [J] . Ye-In Chang, Chia-En Li, Wei-Hau Peng, International Journal of Electrical Engineering: Transactions of the Chinese Institute of Engineers, Series E . 2013,第2期

机译：高效的子格算法，用于挖掘数据流中的封闭频率项和最大频率项
2. EIFDD: An efficient approach for erasable itemset mining of very dense datasets [J] . Giang Nguyen, Tuong Le, Bay Vo, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2015,第1期

机译：EIFDD：一种非常密集的数据集可擦除项集挖掘的有效方法
3. PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data [J] . Yoshitaka Yamamoto, Yasuo Tabei, Koji Iwanuma Journal of Intelligent Information Systems . 2020,第1期

机译：遮阳伞：流动数据中可伸缩频繁项目集挖掘的混合近似方法
4. Privacy-Preserving Frequent Itemset Mining for Sparse and Dense Data [C] . Peeter Laud1, Alisa Pankova Nordic conference on secure IT systems . 2017

机译：稀疏和密集数据的隐私保护频繁项集挖掘
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Gene Expression Data Analysis Using Closed Itemset Mining for Labeled Data [O] . Ana Rotter, Petra Kralj Novak, Špela Baebler, -1

机译：使用封闭项集挖掘标记数据的基因表达数据分析
7. Approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data [O] . Cuzzocrea Alfredo, Leung Carson K., Mackinnon Richard Kyle 2015

机译：挖掘不确定数据的概率集中频繁项集的预期支持的近似值
8. Frequent Itemset Mining for Query Expansion in Microblog Ad-hoc Search. [R] . Aboulnaga, Y., Clarke, C. L. 2012

机译：微博ad-hoc搜索中用于查询扩展的频繁项集挖掘。

Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles

摘要

著录项

相似文献

相关主题

期刊订阅