首页> 外文学位 >Mining Structural Patterns in Biological Networks.
【24h】

Mining Structural Patterns in Biological Networks.

机译:在生物网络中挖掘结构模式。

获取原文
获取原文并翻译 | 示例

摘要

Biological networks capture information about the way biomolecules, such as genes, proteins, and metabolites, interact with each other. Discovering interesting patterns in them will enable one to better understand biological processes such as cellular organization, transcription regulation and phenotypic evolution, etc. Biological networks can be modeled as graphs with vertices representing biomolecules and edges representing the interactions between them. Graph mining algorithms have been used to find frequently-occurring subgraphs in biological graphs. As these subgraphs may only be "overrepresented patterns", these algorithms are sometimes not considered very useful. What is needed is an algorithm that can be used to find not only frequently-occurring patterns, but patterns that can actually characterize biological networks and allow them to be discriminated from each other.;For many biological networks, other than the name of each of their constituent biomolecules, a number of other attributes are usually also known about them. For example, other than the name of each protein in a PPI network, we also know, for many of the proteins, the functions they perform, the cellular processes they are involved in, etc. Proteins always perform more than one molecular function and are involved in multiple cellular processes [67]. The information provided by all these additional attributes are currently not taken into consideration by graph mining algorithms even though they can be very useful. To take into considerations the multiple attributes of the constituent biomolecules, we model the biological network as a multiple-attribute graph using gene ontology to allow more information, other than direct interactions between biomolecules, to be used in the graph mining process. The multiple-attribute graph representation allows vertices to not only represent biomolecules but also the attributes that associate with them. Subgraphs in a multiple-attribute graph may relate to each other and if a node is used to represent a subgraph, hierarchical multiple attribute graph can also be formed and mined for patterns. In this thesis, we propose a graph mining algorithm that can be used to discover interesting patterns in such graphs. The algorithm is called MISPAG (M&barbelow;ining I&barbelow;nteresting S&barbelow;tructural P&barbelow;atterns in A&barbelow;ttributed G&barbelow;raphs). MISPAG is able to discover interesting subgraphs using an interestingness measure that can be used to determine if a certain subgraph occurs more, or less, frequently in a graph than expected. The interestingness measure can take into consideration the multiple attributes of the constituent biomolecules of a biological network and can be used to filter out subgraphs that do not contribute to the unique characterization and discrimination of a network or a class of networks even if they occur frequently according to some user threshold. MISPAG can be modified as different algorithms that suitable to solve such problems as motif discovery, network identification, protein function prediction, molecular classification, and protein complexes discovery. These algorithms have been implemented and tested with real biological data in different application areas. Experimental results show that our proposed algorithms can effectively uncover patterns that are biologically meaningful for the deciphering of the biological and structural relationships in the networks, and for the prediction of un-annotated functions and features of proteins, genes, and chemical compounds.
机译:生物网络捕获有关生物分子(例如基因,蛋白质和代谢物)相互作用方式的信息。在它们中发现有趣的模式将使人们能够更好地理解生物过程,例如细胞组织,转录调控和表型进化等。生物网络可以建模为图形,其中顶点表示生物分子,边缘表示它们之间的相互作用。图挖掘算法已被用于在生物图中查找频繁出现的子图。由于这些子图可能只是“过度代表的模式”,因此有时认为这些算法不是很有用。所需要的是一种算法,该算法不仅可以用于查找频繁出现的模式,而且可以查找可以实际表征生物网络并允许彼此区分的模式。对于许多生物网络,除了每个网络的名称之外它们的组成生物分子,通常还知道许多其他属性。例如,除了PPI网络中每种蛋白质的名称之外,我们还知道,对于许多蛋白质,它们执行的功能,它们所参与的细胞过程等。蛋白质始终执行不止一种分子功能,并且参与多个细胞过程[67]。尽管图挖掘算法可能非常有用,但目前尚未将所有这些其他属性提供的信息考虑在内。考虑到组成生物分子的多个属性,我们使用基因本体将生物网络建模为多属性图,以允许在图挖掘过程中使用除生物分子之间的直接相互作用以外的更多信息。多属性图表示使顶点不仅可以表示生物分子,还可以表示与生物分子相关的属性。多属性图中的子图可能相互关联,并且如果使用节点表示子图,则还可以形成分层的多属性图并挖掘其模式。在本文中,我们提出了一种图挖掘算法,该算法可用于发现此类图中有趣的模式。该算法称为MISPAG(在M中,在S中,在S中,在P中,在P中,在A中,在G中,在R中,将其称为P)。 MISPAG能够使用一种兴趣度度量来发现有趣的子图,该度量可用于确定某个子图在图中的出现次数是否比预期的频繁或少。兴趣度度量可以考虑生物网络的组成生物分子的多个属性,并且可以用于过滤出子图,即使这些子图频繁出现,也不会有助于网络或一类网络的独特特征和区分。达到某个用户阈值。可以将MISPAG修改为适合解决诸如基序发现,网络识别,蛋白质功能预测,分子分类和蛋白质复合物发现等问题的不同算法。这些算法已经在不同的应用领域中用实际的生物学数据实施和测试。实验结果表明,我们提出的算法可以有效地揭示对于网络中生物学和结构关系的解密以及蛋白质,基因和化合物的未注释功能和特征的预测具有生物学意义的模式。

著录项

  • 作者

    Man, Lam Wai.;

  • 作者单位

    Hong Kong Polytechnic University (Hong Kong).;

  • 授予单位 Hong Kong Polytechnic University (Hong Kong).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 196 p.
  • 总页数 196
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号