An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

机译：基于相关性差异的隐式项目集对挖掘算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this par per presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound item-sets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.

机译：给定一个交易数据库作为交易的全局集合，并通过某种条件对全局数据库进行条件处理而获得其本地数据库，我们考虑一对项目集，它们在本地数据库中的相关度高于全局项。在一个数据库中找到具有高相关性的配对项目集的问题被称为“发现相关性”，并且已经提出了一些搜索这种特征配对项目集的算法。但是，即使本地数据库中的非特性配对项目集也很有意义，只要本地数据库中的相关度比全局数据库中的相关度提高得多。它们可能是隐式和隐藏的证据，表明发生了本地数据库特有的某些事件，即使它们尚未在本地实现为特征性的。从这个角度出发，我们已经提出了通过对本地数据库进行条件处理前后的两个相关性差异来测量配对项目集的重要性，并定义相关性差异程度高的DC对的概念。由于DC对被视为由两个组件项目集组成的复合项目集，因此我们可以通过两种基本策略来找到它们。一种策略是先检查复合项目集，然后检查组件，另一种策略是检查组件项目集，然后检查复合项目集。根据我们已经提出并对其有效性进行测试的前一种策略，我们必须枚举许多无法分解为组件的候选复合项目集。因此，该参数根据第二种策略提出了一种新算法。它首先根据新的修剪规则来枚举可能的组件项目集，以删除无用的组件。其次，它通过组合由此检测到的组件来形成复合项集，同时我们还利用约束来防止我们的算法检查无意义的组合。

著录项

来源
《International Conference on Discovery Science(DS 2005); 20051008-11; Singapore(SG)》|2005年|P.227-240|共14页
会议地点 Singapore(SG)
作者
Tsuyoshi Taniguchi; Makoto Haraguchi;
展开▼
作者单位

Division of Computer Science, Hokkaido University, N-14 W-9, Sapporo 060-0814, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
入库时间 2022-08-26 13:46:53

相似文献

外文文献
中文文献
专利

1. Paradigm and performance analysis of distributed frequent itemset mining algorithms based on Mapreduce [J] . Xiao Wen, Hu Juan Microprocessors and microsystems . 2021,第Apra期

机译：基于MapReduce的分布式频繁项目集矿业算法的范例与性能分析
2. Mining interesting infrequent and frequent itemsets based on multiple level minimum supports and minimum correlation strength [J] . Xiangjun Dong, Chuanlu Liu International Journal of Services Technology and Management . 2015,第4a6期

机译：基于多级最小支持和最小关联强度来挖掘有趣的不频繁和频繁项目集
3. CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets [J] . Fatemi Seyed Mohsen, Hosseini Seyed Mohsen, Kamandi Ali, International journal of machine learning and cybernetics . 2021,第2期

机译：CL-MAX：用于采矿最大频繁项目集的基于聚类的近似算法
4. An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations [C] . Tsuyoshi Taniguchi, Makoto Haraguchi International Conference on Discovery Science . 2005

机译：基于相关性差异的挖掘隐式项目集对算法
5. New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases. [D] . Peterson, Erich Allen. 2012

机译：在某些不确定数据库中频繁进行顺序模式和项集数据挖掘的新算法。
6. Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms [O] . Chun-Wei Lin, Binbin Zhang, Kuo-Tung Yang, -1

机译：基于遗传算法的交易隐藏有效隐藏敏感项集
7. An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations [O] . Taniguchi, Tsuyoshi, Haraguchi, Makoto 2005

机译：基于相关性差异的隐式项目集对挖掘算法

An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

摘要

著录项

相似文献

相关主题

期刊订阅