...
首页> 外文期刊>Intelligent data analysis >Constraint-based concept mining and its application to microarray data analysis
【24h】

Constraint-based concept mining and its application to microarray data analysis

机译:基于约束的概念挖掘及其在微阵列数据分析中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We are designing new data mining techniques on boolean contexts to identify a priori interesting bi-sets, i.e., sets of objects (or transactions) and associated sets of attributes (or items). It improves the state of the art in many application domains where transactional/boolean data are to be mined (e.g., basket analysis, WWW usage mining, gene expression data analysis). The so-called (formal) concepts are important special cases of a priori interesting bi-sets that associate closed sets on both dimensions thanks to the Galois operators. Concept mining in boolean data is tractable provided that at least one of the dimensions (number of objects or attributes) is small enough and the data is not too dense. The task is extremely hard otherwise. Furthermore, it is important to enable user-defined constraints on the desired bi-sets and use them during the extraction to increase both the efficiency and the a priori interestingness of the extracted patterns. It leads us to the design of a new algorithm, called D-Miner, for mining concepts under constraints. We provide an experimental validation on benchmark data sets. Moreover, we introduce an original data mining technique for microarray data analysis. Not only boolean expression properties of genes are recorded but also we add biological information about transcription factors. In such a context, D-Miner can be used for concept mining under constraints and outperforms the other studied algorithms. We show also that data enrichment is useful for evaluating the biological relevancy of the extracted concepts.
机译:我们正在针对布尔上下文设计新的数据挖掘技术,以识别先验有趣的双集,即对象集(或事务集)和关联的属性集(或项)。它改善了要挖掘事务性/布尔数据的许多应用领域的技术水平(例如,篮子分析,WWW使用情况挖掘,基因表达数据分析)。所谓的(形式)概念是先验有趣双集的重要特例,这些双集由于Galois运算符而在两个维度上关联了封闭集。如果至少一个维度(对象或属性的数量)足够小并且数据不太密集,则布尔数据的概念挖掘就很容易做到。否则,这项任务将非常艰巨。此外,重要的是要对所需的二元集启用用户定义的约束,并在提取过程中使用它们,以提高提取模式的效率和先验趣味性。它引导我们设计一种称为D-Miner的新算法,用于在约束条件下挖掘概念。我们提供了对基准数据集的实验验证。此外,我们介绍了一种用于微阵列数据分析的原始数据挖掘技术。不仅记录了基因的布尔表达特性,而且我们还添加了有关转录因子的生物学信息。在这种情况下,D-Miner可以在约束条件下用于概念挖掘,并且性能优于其他研究算法。我们还表明,数据丰富对于评估提取概念的生物学相关性很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号