【24h】

Evaluation of sampling for data mining of association rules

机译:关联规则数据挖掘的抽样评估

获取原文

摘要

The discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring item sets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. The authors show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. They may also be able to make the sampled database resident in main-memory. Furthermore, they show that sampling can accurately represent the data patterns in the database with high confidence. They experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, accuracy, and confidence of the chosen sample.
机译:关联规则的发现是数据挖掘中的典型问题。提出的用于关联规则数据挖掘的当前算法在数据库上反复遍历,以确定常见的项目集(或项目集)。对于大型数据库,扫描数据库的I / O开销可能非常高。作者表明,对数据库中的事务进行随机抽样是查找关联规则的有效方法。通过减少I / O成本并大幅减少要考虑的事务数量,采样可以将挖掘过程加快一个数量级以上。他们也许还可以使采样的数据库驻留在主内存中。此外,他们表明,采样可以高可信度准确地表示数据库中的数据模式。他们通过实验评估了在不同数据库上进行抽样的有效性,并研究了所选样本的性能,准确性和置信度之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号