首页> 外文会议>International Conference on Discovery Science(DS 2005); 20051008-11; Singapore(SG) >An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations
【24h】

An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

机译:基于相关性差异的隐式项目集对挖掘算法

获取原文
获取原文并翻译 | 示例

摘要

Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this par per presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound item-sets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.
机译:给定一个交易数据库作为交易的全局集合,并通过某种条件对全局数据库进行条件处理而获得其本地数据库,我们考虑一对项目集,它们在本地数据库中的相关度高于全局项。在一个数据库中找到具有高相关性的配对项目集的问题被称为“发现相关性”,并且已经提出了一些搜索这种特征配对项目集的算法。但是,即使本地数据库中的非特性配对项目集也很有意义,只要本地数据库中的相关度比全局数据库中的相关度提高得多。它们可能是隐式和隐藏的证据,表明发生了本地数据库特有的某些事件,即使它们尚未在本地实现为特征性的。从这个角度出发,我们已经提出了通过对本地数据库进行条件处理前后的两个相关性差异来测量配对项目集的重要性,并定义相关性差异程度高的DC对的概念。由于DC对被视为由两个组件项目集组成的复合项目集,因此我们可以通过两种基本策略来找到它们。一种策略是先检查复合项目集,然后检查组件,另一种策略是检查组件项目集,然后检查复合项目集。根据我们已经提出并对其有效性进行测试的前一种策略,我们必须枚举许多无法分解为组件的候选复合项目集。因此,该参数根据第二种策略提出了一种新算法。它首先根据新的修剪规则来枚举可能的组件项目集,以删除无用的组件。其次,它通过组合由此检测到的组件来形成复合项集,同时我们还利用约束来防止我们的算法检查无意义的组​​合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号