首页> 外文期刊>PLoS Computational Biology >CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data
【24h】

CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

机译:CGBayesNets:混合离散和连续数据的条件高斯贝叶斯网络学习和推理

获取原文
           

摘要

Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.
机译:贝叶斯网络(BN)在生物信息学中一直是流行的预测建模形式,但是由于无法清晰地处理具有离散和连续变量混合的域,因此它们在现代基因组学中的应用受到了影响。现有的免费BN软件包要么离散化连续变量(这可能导致信息丢失),要么不包含推理例程,这使得无法使用BN进行预测。我们提出了CGBayesNets,这是一个BN软件包,致力于通过混合离散变量和连续变量来预测临床表型,从而填补了这些空白。 CGBayesNets为条件高斯贝叶斯网络(CGBNs)形式主义实现了贝叶斯似然和推理算法,一种适合于从例如多峰基因组数据中预测感兴趣的结果的算法。我们提供了四种不同的网络学习算法,每种算法都会在计算成本和网络可能性之间做出不同的权衡。 CGBayesNets提供了一整套用于模型探索和验证的功能,包括交叉验证,自举和AUC操作。我们重点介绍了CGBayesNets以前获得的一些结果,包括树木基因组学的木材特性预测模型,混合基因组学数据的白血病亚型分类以及代谢组学特征对重症监护病房死亡率结果的可靠预测。我们还提供了有关公共代谢组学和基因表达数据集的详细示例分析。 CGBayesNets是在MATLAB中实现的,并且可以通过MATLAB的源代码获得,它具有开放源代码许可,并且可以从http://www.cgbayesnets.com匿名下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号