首页> 外文会议>International Conference on Discovery Science >An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations
【24h】

An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

机译:基于相关性差异的挖掘隐式项目集对算法

获取原文

摘要

Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this paper presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound itemsets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.
机译:鉴于事务数据库作为全局事务集及其本地数据库,通过某些调理到全局将其获得,我们考虑一对项目集,其在本地数据库中的相关程度比全局在全局中更高。在一个数据库中找到具有高相关的配对项集的问题被称为相关性的发现,并且已经提出了一些搜索此类特征成对项目集的算法。然而,即使是本地数据库中的非特征成对项集也是有意义的,所以在本地数据库中的相关程度增加到总体上的相关程度比全局更高。它们可以是一个隐含的和隐藏的证据,表明,即使它们尚未实现当地的特征,它们也会发生某些东西。从这个角度来看,我们已经提出通过在调理到本地数据库之前和之后的两个相关性的差异来测量配对项集的重要性,并定义其相关性程度高的DC对的概念。随着DC对被视为由两个组件项组成的复合项集,我们可以有两个基本策略来查找它们。一项策略首先检查了复合项,然后是组件,而另一个策略是组件项目集,然后是化合物。根据我们已经提出并测试其有效性的前策略,我们必须枚举许多候选人的候选复合项目,这不能分解对组件。因此,本文提出了一种新的算法,根据第二策略。基于切断无用的组件的新修剪规则它首先枚举可能组件项目集。其次,它通过组合如此检测到的组件来形成复合项目集,而我们也使用限制来防止我们的算法检查毫无意义的组​​合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号