首页> 外文会议>International Conference on Computer and Electrical Engineering;ICCEE '09 >Developing Novel and Effective Approach for Association Rule Mining Using Progressive Sampling
【24h】

Developing Novel and Effective Approach for Association Rule Mining Using Progressive Sampling

机译:使用渐进采样开发新颖有效的关联规则挖掘方法

获取原文

摘要

A challenging task in data mining is the process of discovering association rules from a large database. Most of the existing association rule mining algorithms make repeated passes over the entire database to determine the frequent itemsets, which is likely to incur an extremely high I/O overhead. A simple but an effective way to overcome this problem is to sample the database, such that, it produces rules with highest achievable accuracy on the large database. Numerous researchers have proposed sampling approaches for faster and efficient mining of association rules. In this paper, we propose a novel and effective progressive sampling-based approach for mining association rules from a large database. Initially, the frequent patterns are extracted using Apriori algorithm from an initial sample that is selected based on the temporal characteristics and the size of the database. Using the frequent itemsets generated, the negative border of the initial sample is obtained and sorted. Subsequently, the midpoint itemset in the sorted negative border is scanned in the concrete database to check if it is frequent. Based on the support level computed for the midpoint itemset, the sample size is either progressively increased for determining an optimal sample or association rules are mined by considering it as an optimal sample. The experimental results demonstrate the efficiency of the proposed progressive sampling approach in effective mining of association rules.
机译:数据挖掘中的一项艰巨任务是从大型数据库中发现关联规则的过程。现有的大多数关联规则挖掘算法中,大多数都会对整个数据库进行反复遍历以确定频繁的项目集,这很可能会导致极高的I / O开销。克服此问题的一种简单而有效的方法是对数据库进行采样,以使其在大型数据库上生成具有最高可达到的准确性的规则。许多研究人员提出了采样方法,以更快,更有效地挖掘关联规则。在本文中,我们提出了一种新颖有效的基于渐进采样的方法,用于从大型数据库中挖掘关联规则。最初,使用Apriori算法从基于时间特征和数据库大小选择的初始样本中提取频繁模式。使用生成的频繁项集,可以获取并排序初始样本的负边界。随后,在具体数据库中扫描排序后的负边框中的中点项目集,以检查其是否频繁。基于为中点项目集计算的支持水平,可以逐渐增加样本大小以确定最佳样本,或者通过将关联规则视为最佳样本来挖掘关联规则。实验结果证明了在有效挖掘关联规则中所提出的渐进采样方法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号