首页> 外文会议>Advances in Knowledge Discovery and Data Mining >Efficient Rule Retrieval and Postponed Restrict Operations for Association Rule Mining
【24h】

Efficient Rule Retrieval and Postponed Restrict Operations for Association Rule Mining

机译:关联规则挖掘的有效规则检索和延迟的限制操作

获取原文

摘要

Knowledge discovery in databases is a complex, iterative, and highly interactive process. When mining for association rules, typically interactivity is largely smothered by the execution times of the rule generation algorithms. Our approach is to accept a single, possibly expensive run, but all subsequent mining queries are supposed to be answered interactively by accessing a sophisticated rule cache. However there are two critical aspects. First, access to the cache must be efficient and comfortable. Therefore we enrich the basic association mining framework by descriptions of items through application dependent attributes. Furthermore we extend current mining query languages to deal with these attributes through "exist" and "any" quantifiers. Second, the cache must be prepared to answer a broad variety of queries without rerunning the mining algorithm. A main contribution of this paper is that we show how to postpone restrict operations on the transactions from rule generation to rule retrieval from the cache. That is, without actually rerunning the algorithm, we efficiently construct those rules from the cache that would have been generated if the mining algorithm were run on only a subset of the transactions. In addition we describe how we implemented our ideas on a conventional relational database system. We evaluate our prototype concerning response times in a pilot application at DaimlerChrysler. It turns out to satisfy easily the demands of interactive data mining.
机译:数据库中的知识发现是一个复杂,反复且高度交互的过程。在挖掘关联规则时,通常,规则生成算法的执行时间会极大地抑制交互性。我们的方法是接受一个可能很昂贵的运行,但是应该通过访问复杂的规则缓存以交互方式回答所有后续挖掘查询。但是,有两个关键方面。首先,对缓存的访问必须高效且舒适。因此,我们通过依赖于应用程序的属性对项目进行描述,从而丰富了基本的关联挖掘框架。此外,我们扩展了当前的挖掘查询语言,以通过“存在”和“任何”量词来处理这些属性。其次,缓存必须准备好回答各种各样的查询,而无需重新运行挖掘算法。本文的主要贡献在于,我们展示了如何推迟对事务的限制操作,从规则生成到从缓存中检索规则。也就是说,如果没有真正重新运行该算法,那么我们可以从缓存中高效地构造那些规则,如果挖掘算法仅在事务的一个子集上运行,则这些规则将已经生成。另外,我们描述了如何在常规的关系数据库系统上实现我们的想法。我们在戴姆勒克莱斯勒的试点应用中评估了有关响应时间的原型。事实证明,可以轻松满足交互式数据挖掘的需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号