首页> 外文学位 >Bayesian Network Learning and Applications in Bioinformatics.
【24h】

Bayesian Network Learning and Applications in Bioinformatics.

机译:贝叶斯网络学习及其在生物信息学中的应用。

获取原文
获取原文并翻译 | 示例

摘要

A Bayesian network (BN) is a compact graphic representation of the probabilistic relationships among a set of random variables. The advantages of the BN formalism include its rigorous mathematical basis, the characteristics of locality both in knowledge representation and during inference, and the innate way to deal with uncertainty. Over the past decades, BNs have gained increasing interests in many areas, including bioinformatics which studies the mathematical and computing approaches to understand biological processes.;In this thesis, I develop new methods for BN structure learning with applications to biological network reconstruction and assessment. The first application is to reconstruct the genetic regulatory network (GRN), where each gene is modeled as a node and an edge indicates a regulatory relationship between two genes. In this task, we are given time-series microarray gene expression measurements for tens of thousands of genes, which can be modeled as true gene expressions mixed with noise in data generation, variability of the underlying biological systems etc. We develop a novel BN structure learning algorithm for reconstructing GRNs.;The second application is to develop a BN method for protein-protein interaction (PPI) assessment. PPIs are the foundation of most biological mechanisms, and the knowledge on PPI provides one of the most valuable resources from which annotations of genes and proteins can be discovered. Experimentally, recently-developed high- throughput technologies have been carried out to reveal protein interactions in many organisms. However, high-throughput interaction data often contain a large number of iv spurious interactions. In this thesis, I develop a novel in silico model for PPI assessment. Our model is based on a BN that integrates heterogeneous data sources from different organisms.;The main contributions are:;1. A new concept to depict the dynamic dependence relationships among random variables, which widely exist in biological processes, such as the relationships among genes and genes' products in regulatory networks and signaling pathways. This concept leads to a novel algorithm for dynamic Bayesian network learning. We apply it to time-series microarray gene expression data, and discover some missing links in a well-known regulatory pathway. Those new causal relationships between genes have been found supportive evidences in literature.;2. Discovery and theoretical proof of an asymptotic property of K2 algorithm (a well-known efficient BN structure learning approach). This property has been used to identify Markov blankets (MB) in a Bayesian network, and further recover the BN structure. This hybrid algorithm is evaluated on a benchmark regulatory pathway, and obtains better results than some state-of-art Bayesian learning approaches.;3. A Bayesian network based integrative method which incorporates heterogeneous data sources from different organisms to predict protein-protein interactions (PPI) in a target organism. The framework is employed in human PPI prediction and in assessment of high-throughput PPI data. Furthermore, our experiments reveal some interesting biological results.;4. We introduce the learning of a TAN (Tree Augmented Naive Bayes) based network, which has the computational simplicity and robustness to high-throughput PPI assessment. The empirical results show that our method outperforms naive Bayes and a manual constructed Bayesian Network, additionally demonstrate sufficient information from model organisms can achieve high accuracy in PPI prediction.
机译:贝叶斯网络(BN)是一组随机变量之间的概率关系的紧凑图形表示。国阵形式主义的优点包括其严格的数学基础,知识表示和推理过程中局部性的特征以及处理不确定性的固有方式。在过去的几十年中,BN在许多领域都受到了越来越多的关注,包括研究信息和生物方法以了解生物过程的生物信息学。本论文中,我开发了用于BN结构学习的新方法,并将其应用于生物网络的重建和评估。第一个应用是重建遗传调控网络(GRN),其中每个基因都被建模为一个节点,边缘指示两个基因之间的调控关系。在此任务中,我们为数以万计的基因提供了时间序列微阵列基因表达测量,可以将其建模为真实的基因表达,并在数据生成,基础生物系统的可变性等方面混合了噪声。我们开发了一种新颖的BN结构第二种应用是开发一种用于蛋白质-蛋白质相互作用(PPI)评估的BN方法。 PPI是大多数生物学机制的基础,有关PPI的知识提供了最有价值的资源之一,从中可以发现基因和蛋白质的注释。通过实验,最近开发的高通量技术已被用来揭示许多生物体中的蛋白质相互作用。但是,高通量交互数据通常包含大量的iv虚假交互。在本文中,我开发了一种用于PPI评估的新型计算机模型。我们的模型基于融合了来自不同生物体的异构数据源的BN .;主要贡献是:1。一种描述随机变量之间动态依赖关系的新概念,这种随机变量广泛存在于生物过程中,例如调节网络和信号通路中基因与基因产物之间的关系。这个概念导致了一种新颖的动态贝叶斯网络学习算法。我们将其应用于时间序列微阵列基因表达数据,并发现了众所周知的调控途径中的一些缺失环节。基因之间的那些新的因果关系已经在文献中被发现。2。 K2算法(一种著名的有效BN结构学习方法)的渐近性质的发现和理论证明。此属性已用于识别贝叶斯网络中的Markov毯子(MB),并进一步恢复BN结构。该混合算法在基准监管途径上进行了评估,并且比某些最新的贝叶斯学习方法获得了更好的结果。一种基于贝叶斯网络的集成方法,该方法结合了来自不同生物体的异构数据源,以预测目标生物体中的蛋白质-蛋白质相互作用(PPI)。该框架可用于人类PPI预测和高通量PPI数据评估。此外,我们的实验揭示了一些有趣的生物学结果。; 4。我们介绍了基于TAN(树增强朴素贝叶斯)的网络的学习,该网络对高通量PPI评估具有计算简单性和鲁棒性。实验结果表明,我们的方法优于朴素的贝叶斯方法和人​​工构造的贝叶斯网络,另外还证明了来自模型生物的足够信息可以实现PPI预测的高精度。

著录项

  • 作者

    Lin, Xiaotong.;

  • 作者单位

    University of Kansas.;

  • 授予单位 University of Kansas.;
  • 学科 Bioinformatics.;Computer science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 120 p.
  • 总页数 120
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号