首页> 美国卫生研究院文献>other >Bayesian Additive Regression Trees using Bayesian Model Averaging
【2h】

Bayesian Additive Regression Trees using Bayesian Model Averaging

机译:使用贝叶斯模型平均的贝叶斯加性回归树

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at:
机译:贝叶斯可加回归树(BART)是树模型的统计和。可以认为是机器学习树集成方法的贝叶斯版本,其中单个树是基础学习者。但是,对于变量p较大的数据集,该算法可能会变得效率低下且计算量大。另一种流行于高维数据的方法是随机森林,这是一种机器学习算法,该算法使用贪婪搜索来寻找最佳分割点来生长树木。但是,其默认实现不会产生概率估计或预测。对于BART,我们提出了另一种称为BART-BMA的拟合算法,该算法使用贝叶斯模型平均和贪婪搜索算法来为p大的数据集提供比BART更有效的后验分布。 BART-BMA结合了BART和随机森林的元素,以提供可以处理高维数据的基于模型的算法。我们发现BART-BMA可以在合理的时间内在标准笔记本电脑上针对“小n大p”方案运行,这在生物信息学的许多领域都很普遍。我们使用模拟数据和来自两个真实蛋白质组学实验的数据展示了这种方法,一个用于区分患有心血管疾病的患者和对照组,另一个用于区分侵袭性和非侵袭性前列腺癌。我们将我们的结果与其主要竞争对手进行比较。使用R和Rcpp编写的用于运行BART-BMA的开源代码可以在以下位置找到:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号