Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets

机译：大型数据仓库中频繁项目集的挖掘：稀疏数据集的一种新方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Proposing efficient techniques for discovery of useful information and valuable knowledge from very large databases and data warehouses has attracted the attention of many researchers in the field of data mining. The well-known Association Rule Mining (ARM) algorithm, Apriori, searches for frequent itemsets (i.e., set of items with an acceptable support) by scanning the whole database repeatedly to count the frequency of each candidate itemset. Most of the methods proposed to improve the efficiency of the Apriori algorithm attempt to count the frequency of each itemset without re-scanning the database. However, these methods rarely propose any solution to reduce the complexity of the inevitable enumerations that are inherited within the problem. In this paper, we propose a new algorithm for mining frequent itemsets and also association rules. The algorithm computes the frequency of itemsets in an efficient manner. Only a single scan of the database is required in this algorithm. The data is encoded into a compressed form and stored in main memory within a suitable data structure. The proposed algorithm works in an iterative manner, and in each iteration, the time required to measure the frequency of an itemset is reduced further (i.e., checking the frequency of n-dimensional candidate itemsets is much faster than those of n-1 dimensions). The efficiency of our algorithm is evaluated using artificial and real-life datasets. Experimental results indicate that our algorithm is more efficient than existing algorithms.

机译：提出从大型数据库和数据仓库中发现有用信息和有价值知识的有效技术引起了数据挖掘领域许多研究人员的关注。众所周知的关联规则挖掘（ARM）算法Apriori通过重复扫描整个数据库以计算每个候选项目集的频率来搜索频繁的项目集（即具有可接受支持的项目集）。为提高Apriori算法的效率而提出的大多数方法都尝试在不重新扫描数据库的情况下对每个项目集的频率进行计数。但是，这些方法很少提出解决方案，以减少问题中继承的不可避免的枚举的复杂性。在本文中，我们提出了一种用于挖掘频繁项集以及关联规则的新算法。该算法以有效的方式计算项目集的频率。此算法仅需要对数据库进行一次扫描。数据被编码为压缩形式，并存储在合适数据结构内的主存储器中。所提出的算法以迭代方式工作，并且在每次迭代中，测量项目集频率所需的时间进一步减少（即，检查n维候选项目集的频率比n-1维项目的频率快得多）。我们使用人工和现实数据集评估了我们算法的效率。实验结果表明，我们的算法比现有算法更有效。

著录项

来源
《International Conference on Intelligent Data Engineering and Automated Learing(IDEAL 2007); 20071216-19; Birmingham(GB)》|2007年|P.517526|共2页
会议地点 Birmingham(GB)
作者
S.M. Fakhrahmad; M. Zolghadri Jahromi; M.H. Sadreddini;
展开▼
作者单位

Faculty member in Department of Computer Eng., Islamic Azad University of Shiraz, Iran;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
data mining; frequent itemset; association rule mining; transactional database; logical operations;

机译：数据挖掘;频繁项集;关联规则挖掘;事务数据库;逻辑运算;
入库时间 2022-08-26 13:57:03

相似文献

外文文献
中文文献
专利

1. Materialized View Selection for a Data Warehouse Using Frequent Itemset Mining [J] . Mohammad Karim Sohrabi, Vahid Ghods Journal of Computers . 2016,第2期

机译：使用频繁的项目集挖掘数据仓库的物化视图选择
2. A novel algorithm for frequent itemset mining in data warehouses [J] . XU Li-jun, XIE Kang-lin Journal of Zhejiang University. Science . 2006,第2期

机译：数据仓库中频繁项集挖掘的新算法
3. A novel algorithm for frequent itemset mining in data warehouses [J] . XU Li-jun, XIE Kang-lin Journal of Zhejiang University Science: An international applied physics & engineering journal . 2006,第2期

机译：数据仓库中频繁项集挖掘的新算法
4. Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets [C] . S.M. Fakhrahmad, M. Zolghadri Jahromi, M.H. Sadreddini International Conference on Intelligent Data Engineering and Automated Learing . 2007

机译：大型数据仓库中的频繁项目集：提出用于稀疏数据集的新方法
5. Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining. [D] . Hao, Boyu. 2010

机译：从不确定的数据中挖掘频繁项集：约束挖掘和流挖掘的扩展。
6. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data [O] . Otis Smart, Lauren Burrell -1

机译：遗传程序设计和频繁项集挖掘以识别iEEG和fMRI癫痫数据的特征选择模式
7. D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data [O] . Muhammad Yasir, Muhammad Asif Habib, Muhammad Ashraf, 2020

机译：D-Gene：推迟在稀疏大数据中发现频繁项目集的电源集的产生
8. Reconstruction of Tomographic Images from Sparse Data Sets By a New Finite Element Maximum Entropy Approach [R] . Smith, R. T., Zoltani, C. K., Klem, G. J. 1987

机译：利用新的有限元最大熵方法重建稀疏数据集中的层析成像

Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅