...
首页> 外文期刊>BioData Mining >RASMA: a reverse search algorithm for mining maximal frequent subgraphs
【24h】

RASMA: a reverse search algorithm for mining maximal frequent subgraphs

机译:RASMA:用于采矿最大频繁子图的反向搜索算法

获取原文
           

摘要

Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.
机译:鉴于一组基因上的一系列共表达网络,识别出频繁出现的子网是一个重要的研究问题,称为挖掘频繁子图。最大频繁的子图是一种代表性的频繁子图;如果它没有频繁的超级图,则频繁的子图是最大的。在生物信息学学科中,频繁和/或最大频繁子图的采矿方法可用于发现通过频繁子网边缘反射的基因之间的复杂相互作用的有趣网络图案。进一步研究频繁的共抑制子网增强了生物模块的发现和基因表达和疾病分类的生物签名。我们提出了一种反向搜索算法,称为RASMA,用于在给定的图形集合中挖掘频繁和最大频繁的子图。 RASMA的关键创新是一个连接的子图枚举器,它使用反向搜索策略来列举无向图的连接子图。使用此枚举策略,RASMA非常有效地获得所有最大频繁子图。为了克服枚举所有频繁子图的计算禁止任务,同时挖掘最大频繁子图,RASMA采用了几种大幅提高其整体运行时性能的修剪策略。实验结果表明,在大型基因共存网络上,所提出的算法有效地挖掘生物学相关的最大频繁子图。从多基因表达实验中提取复发基因共抑制子网可以发现功能模块和子网生物标志物。我们提出了一种用于挖掘最大频繁子网的反向搜索算法。提取的最大频繁子网的浓缩分析表明,频繁的子网是高度丰富的已知生物本体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号