【24h】

Reasoning about sets using redescription mining

机译:使用重新定义挖掘对集合进行推理

获取原文

摘要

Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to definea vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
机译:重新定义挖掘是新引入的数据挖掘问题,旨在查找提供多个定义的数据子集。从发现含义到对等,它可以看作是关联规则挖掘的概括;作为概念性聚类的一种形式,其目标是确定具有双重特征的聚类;作为一种构造归纳的形式,可以基于给定的描述符构建相互增强的特征。在本文中,我们介绍了使用重新定义挖掘作为推理集合集合的重要工具,尤其是集合的重叠,相似和不同。我们概述了使用封闭项集的最小生成器的概念来挖掘数据集下所有最小(非冗余)重新描述的算法。我们还展示了在交互式上下文中这些算法的使用,支持基于约束的探索和查询。具体而言,我们展示了一种生物信息学应用程序,该应用程序使生物学家能够定义基因域下的一组词汇集并对其进行推理,从而产生重要的生物学见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号