Diverse subgroup set discovery

Van Leeuwen M.; Knobbe A.

首页> 外文期刊>Data mining and knowledge discovery >Diverse subgroup set discovery

【24h】

Diverse subgroup set discovery

机译：多样的亚组集发现

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results. These problems are particularly apparent with subgroup discovery (SD) and its generalisation, exceptional model mining. To address this, we introduce subgroup set discovery: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these (generic) subgroup selection methods in a beam search, the aim is to improve the balance between exploration and exploitation. The proposed algorithm, dubbed DSSD for diverse subgroup set discovery, is experimentally evaluated and compared to existing approaches. For this, a variety of target types with corresponding datasets and quality measures is used. The subgroup sets that are discovered by the competing methods are evaluated primarily on the following three criteria: (1) diversity in the subgroup covers (exploration), (2) the maximum quality found (exploitation), and (3) runtime. The results show that DSSD outperforms each traditional SD method on all or a (non-empty) subset of these criteria, depending on the specific setting. The more complex the task, the larger the benefit of using our diverse heuristic search turns out to be.

机译：出于多种原因，对于大多数现有的发现算法而言，大数据具有挑战性。首先，这样的数据导致了巨大的假设空间，使得穷举搜索变得不可行。第二，由于高基数的（数字）属性，相关属性等，存在许多基本相同模式的变体。这导致top-k挖掘算法返回高度冗余的结果集，而忽略了许多可能有趣的结果。这些问题在子组发现（SD）及其概括，特殊的模型挖掘中尤为明显。为了解决这个问题，我们引入了子组集发现：一个人不应该考虑单个子组，而应该考虑子组集。我们考虑了三个冗余度，并提出了相应的启发式选择策略以消除冗余。通过将这些（通用）子组选择方法合并到波束搜索中，目的是改善勘探与开发之间的平衡。所提议的算法被称为DSSD，可用于各种子集的发现，经过实验评估，并与现有方法进行了比较。为此，使用了各种目标类型以及相应的数据集和质量度量。通过竞争方法发现的子组集主要根据以下三个标准进行评估：（1）子组中的多样性涵盖（探索），（2）发现的最大质量（开发），以及（3）运行时。结果表明，根据特定设置，DSSD在所有或全部（非空）标准中均优于传统的SD方法。任务越复杂，使用我们多样化的启发式搜索的好处就越大。

著录项

来源
《Data mining and knowledge discovery》 |2012年第2期|共35页
作者
Van Leeuwen M.; Knobbe A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Diversity; Exceptional model mining; Heuristic search; Pattern selection; Subgroup set discovery;

机译：多样性;异常模型挖掘;启发式搜索;模式选择;子集发现;

相似文献

外文文献
中文文献
专利

1. Diverse subgroup set discovery [J] . Matthijs van Leeuwen, Arno Knobbe Data Mining and Knowledge Discovery . 2012,第2期

机译：多样的亚组集发现
2. Subgroup discovery in data sets with multi-dimensional responses [J] . Lan Umek, Blaz Zupan Intelligent data analysis . 2011,第4期

机译：具有多维响应的数据集中的子组发现
3. CSM-SD: methodology for contrast set mining through subgroup discovery. [J] . Kralj-Novak P, Lavrac N, Gamberger D, Journal of biomedical informatics. . 2009,第1期

机译：CSM-SD：通过子组发现进行对比集挖掘的方法。
4. SSDP+: A Diverse and More Informative Subgroup Discovery Approach for High Dimensional Data [C] . Tarcísio Lucas, Renato Vimieiro, Teresa Ludermir IEEE Congress on Evolutionary Computation . 2018

机译：SSDP +：高维数据的多样化且信息量更大的子组发现方法
5. Activation tagging as a powerful tool for gene discovery in poplar: Diverse developmental mutants revealed in a population of activation-tagged poplar and the shriveled leaf activation-tagged poplar mutant: Discovery of a novel gene. [D] . Harrison, Edward John. 2008

机译：激活标记是在杨树中发现基因的有力工具：带有激活标签的杨树和萎缩的带有叶子激活标签的杨树突变体群体中发现了多种发育突变体：发现了一个新基因。
6. Deep sequencing of large library selections allows computational discovery of diverse sets of zinc fingers that bind common targets [O] . Anton V. Persikov, Elizabeth F. Rowland, Benjamin L. Oakes, 2014

机译：对大型文库选择进行深度测序可通过计算发现结合共同靶标的各种锌指
7. Diverse subgroup set discovery [O] . van Leeuwen Matthijs, Knobbe Arno 2012

机译：多样的亚组集发现

Diverse subgroup set discovery

摘要

著录项

相似文献

相关主题

期刊订阅